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Abstract 

Dynamic treatment regimes, also known as treatment policies, are increasingly being 
used to operationalize sequential clinical decision making associated with patient care. 
Common approaches to constructing a dynamic treatment regime from data, such as 
Q-learning, employ non-smooth functionals of the data. Therefore, simple inferential 
tasks such as constructing a confidence interval for the parameters in the Q-function 
are complicated by nonregular asymptotics under certain commonly-encountered gen- 
erative models. Methods that ignore this nonregularity can suffer from poor perfor- 
mance in small samples. We construct confidence intervals for the parameters in the 
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Q-function by first constructing smooth, data-dependent, upper and lower bounds on 
these parameters and then applying the bootstrap. The confidence interval is adaptive 
in that although it is conservative for nonregular generative models, it achieves asymp- 
totically exact coverage elsewhere. The small sample performance of the method is 
evaluated on a series of examples and compares favorably to previously published com- 
petitors. Finally, we illustrate the method using data from the Adaptive Interventions 
for Children with ADHD study (Pelham and Fabiano 2008). 

1 Introduction 

Dynamic treatment regimes, also known as treatment policies, are increasingly being used 
to explore how to inform sequential clinical decision making using data. Clinical scientists, 
wanting to develop principled, evidence-based rules for tailoring treatment, have conducted 
Sequential, Multiple Assignment, Randomized Trials (SMART; Lavori and Dawson 2003; 
Murphy 2005; Murphy et al. 2007) in order to evaluate and compare different long-term 
dynamic treatment regimes. In this work, we develop confidence interval methodology that 
can be used to address the following types of scientific questions arising in the development 
of dynamic treatment regimes: "Is there sufficient evidence to conclude that a particular 
treatment is best compared to other treatments when these treatments are considered in the 
context of a dynamic treatment regime?," "Is a particular patient variable useful in tailoring 
treatment, and for which configuration of the patient variables is there sufficient evidence to 
conclude that there exists a unique best treatment option?" 

This work is motivated by our involvement in the Adaptive Interventions for Children 
with Attention Deficit Hyperactivity Disorder (ADHD) study (Center for Children and Fam- 
ihes, SUNY at Buffalo, William E. Pelham PI, lES Grant R324B060045; see also Nahum- 
Shani et al. 2010a). ADHD affects an estimated 5%-10% of school aged children, and is 
characterized by inattention, hyperactivity, and impulsivity (Pliszka 2007). In the years 
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preceding the study, clinicians debated the comparative effectiveness of behavioral modifi- 
cation therapy versus medication as treatment options for ADHD as well as the best se- 
quencing of these treatments (Pliszka 2007; Pelham and Fabiano 2008). As a consequence, 
a SMART trial was conducted with the general aim of estimating the dynamic treatment 
regime that achieves the greatest reduction in ADHD symptoms among school age children. 
This SMART study is composed of two stages. In the first stage, children were randomized 
with equal probability into one of two treatment groups (low-dose behavioral modification 
therapy, low-dose medication). After a burn-in period of eight weeks, children were evaluated 
monthly and at each evaluation deemed either a responder or non-responder. (The opera- 
tionalized definition of nonresponse is given in Nahum-Shani et al. (2010a)). Non-responders 
were immediately re-randomized to either (i) augmentation of treatment, so that the child 
was provided both medication and behavioral modification therapy, or (ii) intensification of 
treatment, so that the child was provided an increased dosage of their current (stage one) 
treatment. Responders were not re-randomized and were provided their current treatment 
at the current dosage level. 

Data collected in a SMART trial like the ADHD study can be used to estimate an optimal 
dynamic treatment regime. This estimation typically uses an extension of regression to 
multistage decision making problems. The extension we consider in this paper is the Q- 
learning algorithm (Watkins and Dayan 1992; Murphy 2005). A variety of other extensions 
exist in the statistical literature (Murphy 2003; Robins 2004; Blatt et al. 2004; Moodie et al. 
2007; Henderson et al. 2009; Zhao et al. 2009). However all of these extensions suffer from 
the same problem of nonregularity that we focus on in this paper (Robins 2004; Moodie and 
Richardson 2007; Henderson et al. 2009; Chakraborty et al. 2009; Moodie et al. 2010). 

In this paper, we provide a method for constructing confidence intervals for parameters 
arising in the Q- Learning algorithm. The primary challenge to this task is that the estimators 
are non-smooth functionals of the data — in particular, the formula for the estimators involves 
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the use of the max operator, which is non-differentiable. Robins (2004) notes two problems 
resulting from the non-differentiability of the max operator. First, while the estimators of 
the regression coefficients are consistent, their limiting distributions can have nonzero mean; 
that is, there is estimation bias on the order of 1/ \/n for some generative models. Second, the 
regression coefficient estimators are nonregular (Bickel 1993; Tsiatis 2006). That is, their 
limiting distributions changes abruptly as one smoothly varies the underlying generative 
model. As a practical consequence, common approaches based on the bootstrap and Taylor 
series arguments provide inconsistent interval estimators and can behave poorly in small 
samples (Andrews and Ploberger 1994; Andrews 2001, 2002; Leeb and Potscher 2005). 

The adaptive confidence interval proposed here is based on smooth, data-dependent, 
upper and lower bounds on the estimators involved in the regression models used by Q- 
learning. Confidence intervals are formed by bootstrapping these bounds. The proposed 
confidence interval is adaptive in that although it is conservative for nonregular generative 
models, it achieves asymptotically exact coverage elsewhere. 

Many authors have focused on reducing the bias of order l/\/n discussed above (recall 
that under some generative models, the estimators of the regression coefficients can have lim- 
iting distributions with nonzero means, thus bias). The methods of Moodie and Richardson 
(2007), Chakraborty et al. (2009) and Song et al. (2010) reduce the estimation bias via the 
use of thresholding. As is well-known, the use of thresholding (or penalization that induces 
variable selection) leads to nonregular estimation (see Leeb and Potscher 2003, 2005 and 
references therein; Chatterjee and Lahiri 2011). Moodie and Richardson propose the use of 
a hard threshold whereas Chakraborty et al. propose a soft-thresholding method that is mo- 
tivated by an empirical Bayes argument. Song et al. generalize the soft-thresholding method 
by use of a lasso-like penalization. In all cases, linear combinations of some parameters may 
be set to zero as a result of the thresholding. In Chakraborty et al. confidence intervals are 
constructed by use of the bootstrap whereas in Song et al., confidence intervals are produced 
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via Taylor series arguments. Both methods work well in the simulations provided. However, 
the standard bootstrap is inconsistent in nonregular settings (Shao 1994; Beran 1997) and 
confidence intervals based on Taylor series arguments do not capture the variation due to 
variable selection/thresholding (Potscher 1991; Leeb and Potscher 2005). Neither the work 
of Song et al., nor Chakraborty et al. need be consistent under a local alternatives frame- 
work. In contrast, this work provides regular (consistent under local alternatives) confidence 
sets. 

Instead of focusing on bias reduction of the estimator we focus directly on the construc- 
tion of high quality confidence intervals. We do this for several reasons: first, it is known 
that in settings in which there is no unbiased estimator, attempts to eliminate the bias 
at some parameter values must lead to large mean square error at other parameter values 
(Doss and Sethuraman, 1989; Liu and Brown 1993; Chen 2004; Hirano and Porter 2009). 
Simulations provided in the supplementary material (Section 3 of the Supplement) provide 
examples of this excessive mean square error. Second, interval estimators (such as confidence 
intervals) that obtain the desired level of confidence can be used to conduct inference about 
the parameters even when there is bias of the order 1 / y/n. Alternate approaches that focus 
directly on confidence intervals include two proposals by Robins (2004), the first of which 
is a projection confidence interval. Robins' second proposal and a natural method that we 
call "plug-in pretesting estimation" share some conceptual similarities with the adaptive 
confidence interval proposed here. See Section 4.2 for discussion. 

Section 2 considers the simplest possible setting, in which there are two stages of treat- 
ment and two treatments available at each stage. Here the adaptive confidence interval (ACI) 
is introduced and asymptotic properties are provided. Section 3 generalizes the problem and 
the ACI to the class of problems with two stages of treatment and an arbitrary number of 
treatments at each stage. In Section 4, we provide an empirical comparison of the ACI with 
the bootstrap and the use of thresholding as represented in Chakraborty et al. (2009) on a 
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number of test cases. The ACI compares favorably. Section 5 contains an application of the 
ACI to the analysis of the ADHD study and a discussion of future work. An extension of the 
ACI to an arbitrary number of stages of treatment, and an arbitrary number of treatments 
at each stage, is given in the supplementary material. 

2 Two stages of binary treatment 

In this section, we develop the adaptive confidence interval for the parameters in the Q- 
function when there are two stages of treatment and two treatments are available at each 
stage. We use uppercase letters such as X and A to denote random variables, and lowercase 
letters such as x and a to denote instances of these random variables. The data consist of 
n trajectories drawn i.i.d. from some fixed and unknown distribution P. Each trajectory 
{Xi, Ai^Yi, X21 A21Y2) is a sequence of random variables collected at two stages t = 1,2; 
Xt e denotes patient measurements collected prior to the tth assignment of treatment. 
At e {1,2}, denotes the binary treatment (also called an action) assigned at stage t and 

e M is a measure of patient response following the assignment of treatment at stage t. 
We assume that Yt has been coded so that a higher value corresponds to a better chnical 
outcome. Let Ht = {Xi,Ai, . . . ,Xt} be the patient history, e.g., the information available 
to the decision maker before the assignment of the tth treatment A^. Furthermore, we 
assume that the treatments. At, are randomly assigned to patients at each stage with known 
probabilities possibly depending on patient history. 

We wish to use data like the above to inform the construction of a Dynamic Treatment 
Regime (DTR). A DTR is sequence of decision rules, one for each stage of treatment, that 
takes as input the patient history and gives as output a recommended treatment. More 
formally, a DTR tt = (7ri,7r2) is an ordered pair of functions ttj so that i^t '■ '^t ^ {1)2} 
where l-Lt Q K*^* is the domain of Ht- Let E'^ denote the joint expectation over Ht,At,Yt 



6 



for i = 1,2 under the restriction that At — TT{Ht). The objective is to learn a DTR n 
which comes close to maximizing the expected clinical outcome W{Yi + Y2). One way to 
estimate an optimal DTR is using the Q- learning algorithm (Watkins 1989), which can be 
conceptualized as an extension of regression to multistage decision making. More precisely, 
Q-learning is a form of approximate dynamic programming, where the conditional mean 
responses are estimated from the data since they cannot be computed explicitly. We now 
describe the Q- learning algorithm with function approximation as in Murphy (2005). To 
start, define 

Q2{h2,a2) = E{Y2\H2^h2,A2^a2) (1) 
Qi{hi,ai) = e(yi+ max Q2{H2, a2)\Hi ^ hi, Ai ^ aA ; (2) 

V a2e{l,2} J 

the functions Qt{ht, at), t = 1,2 are known as Q-functions. At each stage of treatment t the 
Q-function reflects the quality (hence the letter "Q" ) of the treatment at given the patient 
history ht- If the conditional expectations in the preceding display were known, then dynamic 
programming provides an optimal DTR given by 7r*(/it) = arg maxa^e{i,2} Qt{ht, at). In most 
practical settings these mean functions must be approximated from data. In this paper we 
consider linear approximations to the conditional mean function. Specifically, we employ a 
working model of the form 

Qt(ht, at; pt) = PloKo + Pliht,i'^at=i, (3) 

where ht,Q and ht,i are vectors of features comprising the patient history. Note that according 
to the model, if hj ipt^i ~ then both treatments at — 1 and at — 2 yield the approximately 
same response for a patient with history iJ^ 1 = /i^ 1. That is, that there is not a unique best 
treatment for a patient with history Ht^i = ht,i. Conversely, if ^ then exactly one 
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treatment yields the best expected outcome for a patient with history Ht^i — ht^i. We use 
/3t to denote (/3/_0' Let P„ denote the empirical measure. The learning algorithm 
proceeds as follows: 

1. Regress Y2 on H2 and A2 using (3) to obtain 

^2 = arg minP„ {Y^ - Q2{H2, A2; ^2)? , 

and subsequently the approximation Q2{h2, 02; ^2) to the conditional mean Q2{h2, 02). 

2. (a) Define the predicted future reward following the optimal policy as: 



Yi ^ Y,+ max Q2{H2, A2; P2) (4) 

a2G{l,2} 



= Y, + Hl,P2,o+ HI A,! , (5) 



+ 



where [z]^ denotes the positive part of z. 
(b) Regress Yi on Hi and Ai using (3) to obtain j3i = argmin^^ P„(yi— Ai; 

3. Define the estimated optimal DTR as tt = (tti, ^2) so that 



7rt(/it) = arg max Qt{ht, at] A). 

ate{l,2} 



Examination of the above procedure make apparent the close connection between Q-learning 
and dynamic programming. For further elaboration see Watkins and Dayan (1992), Murphy 
(2005), and Zhao et al. (2009). 

The second stage population coefficients, (31, satisfy (31 = argmin^j P (^2 — Q2{H2, A2] (32)Y ■ 
Define Y^ = Yi + 111^(32^ + [-ff2,i/32,i] , > then the first stage population coefficients (31 are 
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given by 



/5* ^ argminP (y{ - Q^{H^, A,- /3,)) 
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Notice that ^* is a non-smooth function of the trajectories due to the presence of the 
[i72,i/52,i] ^ term in Y^; this term, which is due to the maximization of the stage two Q- 
f unction, is the source of the nonregularity in the estimation of optimal dynamic treatment 
regimes (see Robins, 2004; Chakraborty et al. 2009) 

The goal of this paper is the development of asymptotically valid confidence intervals 
for hnear combinations of the first stage coefficients. Note that standard methods 
are appropriate for construction of confidence intervals for the second stage coefficients. 
To better understand the nonregularity and thus the challenge in constructing confidence 
intervals for the first stage coefficients, we provide a useful decomposition of c^y/n{$i — /3*). 
Define Bi = (if^Q, if|]^l^^=i)"'' so that instances of Bj form the rows of the design matrix 
in the first stage regression. Let Si = F^BiBj, then examination of the normal equations 
shows that pi = E^^Pn^i^^i- Hence, for any c e W^^(^i) it follows that c^y/n0i - p^) = 
c^'Ei^\/nFnBi (Yi — Bl(3l^, which, using the definition of Yi, can be further decomposed as 

cTW„ + cTE^^P„SiU„, (6) 



where 



The second term in (6) is non-smooth which can be seen from the definition of 1U„. To 
illustrate the effect of this non-smoothness, fix iJ2,i = /i2,i- If ^2,1/^2,1 > 0; then U„| ^^^^ ^ 
is easily seen to be asymptotically normal with mean zero. On the other hand, if /i2,i/32,i ~ 0, 
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discussed in the introduction the nonregularity in the Umiting distribution compUcates the 
construction of confidence intervals for hnear combinations of (3*. 



hypothesis test that partitions the data into two sets (i) patients for which there appears to 
be a treatment effect, and (ii) patients where it appears there is no treatment effect. The 
bounds are formed by bounding the error of the overall approximation due to misclassification 
of patients in the partitioning step. 

The idea of conducting a preliminary hypothesis test prior to forming estimators or 
confidence intervals is known as preliminary testing or pretesting (see Olshen 1973); indeed 
estimators formed by thresholding implicitly use a pretest. Pretesting has been used in 
Econometrics to provide hypothesis tests and confidence intervals in nonregular settings 
(Andrews 2001, Andrews & Soares 2007; Cheng 2008; Andrews & Guggenberger 2009). In 
these settings, one can identify a small number of problematic parameter values (usually 
one value) at which nonregularity occurs. A pretest is constructed with the null hypothesis 
that the parameter takes this problematic value. If the pretest rejects, a standard critical 
value is used to form the confidence interval; if the pretest accepts, the maximal critical 
value over all possible local alternatives is used to form the confidence interval. In this paper 
the situation is somewhat different, since nonregularity occurs for any combination of the 
distribution of the i72,i and /Jl^^ for which -P[-f^|^i/52_i = 0] > 0. Thus we take a different 
tactic from the simple pretest approach. We conduct a pretest for each individual in the 



The ACI is formed by constructing smooth data-dependent 




Below we construct upper and lower bounds on c^^yn{(3l — PI) by means of a preliminary 
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data set as follows. Define V„ = \/n02,i — ^2,1)^ ^21,21 to be the plug-in estimator of 
the asymptotic covariance matrix of V„. Each pretest is based on T(/i2,i) ~ "('*2,i'^2,i) _ 



/ij, 1^21, 21/12,1 ' 

note that T(/i2,i) corresponds to the usual test statistic when testing the null hypothesis, 
The upper bound on c^^/n0l — is given by 
U{c) ^ c^Wn + cTE-ip„SiU„l^(^^^)>;,^ 

+ sup C^Sr^Pni?! (Kl(Vn + 7)]+ ' KiT] +) ^f(H, ,)<X„ (7) 

where A„ is a tuning parameter that we discuss in detail below. A lower bound jC{c) can 
be defined by replacing the supremum with an infimum. The intuition behind this upper 
bound is as follows. Notice that the second term, c''"S^^P„i?ilU„, in (6) is equal to 

C^E^ ^Pn-BlUnlj.(^2,i)>A„ 

+ c^t^'F^B, [[Hl.iYr, + V^^;,^)]^- KiV^/32*J+) lf(i/..i)<A„- (8) 

The second term in (8) is algebraically equivalent to c^'Ei^FnBiUnlf(^fj2i)<Xn- However, we 
have rexpressed \/nif2,i/^2,i as the sum of H^^^Yn = i?2,i-\/^(/^2,i — /52,i) and -f^2,i/^2,i\/^) 
latter quantity characterizes the degree of nonregularity of \/n0i — /3*) (see Theorem 2.1 
below). Replacing \/n^2,\ with 7 and taking the supremum over all 7 e R'^'^^^^.i) jg one way 
of making the second term in (8) insensitive to local perturbations of ^^x- More precisely, 
this yields a regular upper bound on the last term in (8). Combining this result with (6) 
yields (7). Theorem 2.1 below provides the asymptotic distribution of (7). 

Suppose we want to construct a (1 — a) x 100% confidence interval for By con- 
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struction oiU{c) and jC{c) it follows that 

c'Pi - U{c)l^ < cT/3* < cT^i - L{c)l^. 

We approximate the distribution of the U{c) and >C(c) using the bootstrap. Let u denote 
the (1 — q;/2) x 100 percentile of the bootstrap distribution of ^Y(c), and let I denote the 
(q;/2) X 100 percentile of the bootstrap distribution oi L{c). Then [c"""^! — -u/y^, c"""^! — //y^ 
is the ACI for c^^l. 

Next we show that the ACI is asymptotically valid. First define 

1. B,^{Hl^,Hl^lA,=iy- 

2. Et,oo = PBtB] for t = 1, 2; 

3. ^2(^2, 1^2; /32*) = ^2(1-2 - BIPD- 

4. g, {B,, /32*) = B, {y, + i/To/32*,o + ^,,^2,1] + - ^I/^i*) ; 
We use the following assumptions. 

(Al) The histories H21 features and outcomes Yj, satisfy the moment inequalities 
P||i/2|Pl|5i|P < 00 and Plf IIS2IP < 00. 

(A2) The matrices Ej^oo and Gov {gi,g2) are strictly positive definite. 

(A3) The sequence tends to infinity and satisfies = o(n). 

(A4) For 7* e M'i""(/^2*,i)^ there exists P„ a sequence of local alternatives converging to P in 
the sense that: 

2 

^0, 



J {dP^/^ - dP^/') - ^gdP^/^ 
for some measurable function g for which 
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- if ^ argmin^ P„(y2 - ^2(^/2, ^2; then ^* i,n = ^2,1 + tVv^ + o(l/v^) 
and 

- P„||i72|P Il-Bilp, P„y2^||S2||^ are bounded sequences. 

Assumptions (Al)-(A2) are quite mild, requiring only full rank design matrices and some 
moment conditions. Requirement (A3) constrains a user-chosen tuning parameter and thus 
is always satisfied by appropriate choice of A„. Local alternatives provide a medium through 
which a glimpse of small sample behavior can be obtained, while retaining the mathematical 
convenience of large samples. Assumption (A4) facihtates a discussion of local alternatives 
without attempting to make the weakest possible assumptions (see van der Vaart and Wellner 
1996, see also the remarks at the end of this section). 

The first result regards the population upper bound U{c). Define Y*^^ — -ff2,o/^2,o,n + 
[^I,i/32,i,n]+ and 4 argmin^P„(n*„ - Q,{H„ A,; P)^. 

Theorem 2.1 (Vahdity of population bounds). Assume (A1)-(A3) and fix c e Rdim(^i*). 

2. If for each n, the underlying generative distribution is Pn, which satisfies (A4), then 
the limiting distribution of \/n0\ ~ Pi,n) '^^ given by the distribution of 

cWoo + c-^J:TioPBiHl,Y^lHi^^*^>o 

+ c^^-,l,PB, ( [Hl.iy^ + 7*)] + - +) 1ht,;3|,,=o- (9) 

3. The limiting distribution ofU{c) under both P and under Pn is equal to the distribution 
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of 



+ sup c^^T'ooPBi (Ki(Voo + 7)]+ - Ki7]+) lnl,^^,=o, (10) 

where (WJ^, VJ^) is asymptotically multivariate normal with mean zero. 

See the supplementary material for the proof and the formula for the Cov(Woo, Voo). Notice 
that hmiting distributions of c^A/n{$i — /3*) and U{c) (or equivalently >C(c)) are equal in the 
case -^2,1/52,1 7^ with probability one. That is, when there is a large treatment effect for 
almost all patients then the upper (or lower) bound is tight. However, when there is a non- 
null subset of patients for which there is no treatment effect, then the limiting distribution 
of the upper bound is stochastically larger than the limiting distribution of c^^/n0l — P^). 
Thus, the ACI adapts to the setting in which all patients experience a treatment effect. 

Because the distribution of (9) depends on the local alternative, 7*, ^1 is a nonregular es- 
timator (van der Vaart and Wellner, 1996). One might hope to construct an estimator of the 
distribution of (9) and use this estimator to approximate the distribution of c^y/n0i — f3i). 
However a consistent estimator of the distribution of (9) does not exist because P„ is con- 
tiguous with respect to P (by assumption A4). To see this, let F.^*{u) be the distribution 
of (9) evaluated at a point, u. If a consistent estimator, say Fn{u), existed, that is Fn{u) 
converges in probability to Fj*{u) under P„ then the contiguity implies that Fn{u) converges 
in probability to Fj*{u) under P. This is a contradiction (at best Fn{u) converges in proba- 
bility to Fo{u) under P). Because we can not consistently estimate 7* and we do not know 
the value of 7*, the tightest estimable upper bound on (9) is given by (10). As we shall next 
see, we are able to consistently estimate the distribution of (10). 

In order to form confidence sets, the bootstrap distributions of U{c) and £(c) are used. 
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The next result regards the consistency of these bootstrap distributions. Let Pn^ denote 
the bootstrap empirical measure, that is, Pn"* = S"=i Mn,iSri for M„^i, Mn,2, ■ ■ ■ , Mn,n ~ 
Multinomial(n, {1/n, 1/n, . . . , l/n)). We use the superscript (6) to denote that a functional 
has been replaced by its bootstrap analogue, so that if a; = /(P„) then w'^'^^ = /(Pn^). 
Denote the space of bounded Lipschitz-1 functions on by BLi(M?). Furthermore, let Em 
and Pm denote the expectation and probability with respect to the bootstrap weights. The 
following results are proved in the supplemental material. 

Theorem 2.2. Assume (A1)-(A3) and fixe e R'^^(K) , Then {U{c),C{c)) and {U^^\c) , C^^\c)) 
converge to the same limiting distribution in probability. That is, 



sup 

i;6BLi(]R2) 



Evmc),£{c)))-EMv{{U^'\c),£^'\c))) 



converges in probability to zero. 

Corollary 2.3. Assume (Al)-(A3) and fix c e M^'^^^i*). Let u denote the (1 - a/2) x 100 
percentile ofU^^\c) and I denote the {a/2) x 100 percentile of C^^\c). Then 

Pm (fPi - u/^ < cT/3* < cT^Si - f/y^) > 1 - a + op(l). 

Furthermore, if P{Hlif32^^ = 0) = 0, then the above inequality can be strengthened to equality. 

The preceding results show that the ACI can be use to construct valid confidence intervals 
regardless of the underlying parameters or generative model. Moreover, in settings where 
there is a treatment effect for almost every patient, the ACI delivers asymptotically exact 
coverage. See Section 4 for discussion of the choice of the tuning parameter A„. 

Remcirk 2.4. The restriction on „ given in assumption (A4) is superfluous and can be 

seen to follow as a consequence of the convergence in quadratic mean condition. This is 
proved in the supplementary material. 
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Remcirk 2.5. Assumption (A2) can be relaxed if one is willing to proceed using generalized 
inverses. Since we are in a low dimensional setting we do not pursue this approach further. 

3 Extending the ACI to many treatments 

The two stage binary treatment setting which was addressed in the previous section pro- 
vides the tools necessary to analyze data from many SMART trials including the ADHD 
study. However, there are a number of multistage randomized trials in which more than 
two treatments are available at each stage (Rush et al. 2003; Lieberman et al. 2005). In 
this section, we extend the ACI procedure for use with two stage trials with an arbitrary 
number of treatments available at each stage. The organization of this section parallels 
that of the previous section, however, the material is presented in a somewhat abbreviated 
fashion since much of the intuition has already been provided in earlier sections. In order 
to develop the results in this section, we require additional notation. Again, we observe 
trajectories (Xi, Ai, Fi, X2, A2, 12) drawn i.i.d. from some fixed and unknown distribution 
P. The treatment actions Af take values in the set {!,..., K^} for some fixed number of 
treatments Kf. In typical studies, Kt is no greater than five. We assume that the treat- 
ment action At is randomized with probabilities possibly depending on patient history, Ht 
[Hf = {Xi, Al, . . . , Xf}). We use the following linear model for the Q-function at time t: 

Kt 

gi(/ii,at;A) = (11) 

i=l 

where as before ht^i is a vector of patient features constructed from the patient history, ht 
and f3t = {Pt,iT Pt,iT ■ ■ ■> KtY ■ 1^ (11) '^^ omitted the main effect term (the term involving 
patient features that do not interact with treatment). This constraint permits compact 
theoretical expressions, but is unnecessary for the theoretical results. See the simulation 



16 



study for the use of a contrast coding. Note that according to this working model, if h] i/3t,i — 
maxj^i hi ^Pt,j ~ for some 1 < i < K^, then at least two treatments are approximately 
optimal for a patient with history H^ i = ht^i- That is, there is not a unique best treatment 
for a patient with history Ht^i — ht,i. Conversely, if mini<j<Xt — maxjyj hJiPtj | ^ 0, 

then exactly one treatment yields the best expected outcome for a patient with history 
Ht,i = ht,i- As before, estimation of the optimal DTR is done using the Q-learning algorithm. 
The Q-learning algorithm proceeds as follows: 

1. Regress Y2 on H2 and A2 using (11) to obtain: 

/32 = argminP4F2 - Q2{H2, A2; /^a))', 

and subsequently the approximation (52(^2, 0,2] P2) to the conditional mean (52(^2, 02)- 

2. (a) Define the predicted future reward following the optimal policy as: 

Y, ^ Y,+ max Q2(/^2, ^2; ^2) (12) 

a2&{l,2,...,K2} 

- Y,+ m^HlJ2,i (13) 

(b) Regress Yi on Hi and Ai using (11) to obtain Pi = argmin^j^ ¥n{Yi—Qi{Hi, Ai, /3i))^. 

3. Define the estimated optimal DTR tt = (tti, 7r2) so that 

7rt(/it) = arg max Qt{ht,at; Pt)- 

ate{l,2,...,Kt} 

As before, examination of the normal equations used to construct f3i combined with the 
definition of Yi show that c^\/n0i — (3^) can be decomposed as c^Wn + c^T,i ^FnBilJn, where 
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the definitions of W„ and Un have been generahzed to 



W„ = S^^v^P„£i ( Fi + max i/Ji/?; 

\ l<i<K2 ' 



l<i</<'2 



The nonregularity of the hmiting distribution of c^^/n0l — PI) is apparent by noting the 
non-differentiable max operator in the definition of Un. Define Yn,i = ^A^(/^2,^ — ^2,1) 
i = l,2,...,K2. 

The upper bound U (c) used to construct the ACI is given by 



+ sup c^Er'PnSi (^^max^ ^^l,! (Vn,i + 7i) - ^max^ ^2,i7i) Imin, f,(^2.i)<A„ > (14) 

where 7 = (71,72, • • • 5 7l'2)^- "l"^® lower bound is formed similarly but with an infimum 
instead of a supremum. The test statistic, Tj(/i2,i), is taken from the "multiple comparisons 
with the best" literature (see Hsu 1996 and references therein). This statistic is given by 



Ti{h2,l) 



^2,lCi^2,l 



where Q is the usual plug-in estimator of nCov02,i — ^2,j) for j = argmaxj^j i/32,j, as- 
suming the index j to be fixed a priori. Notice that miuj Tj(/i2,i) should be large if there 
is a uniquely optimal treatment for a patient with history iJ2,i = /i2,i- On the other hand, 
Ti{h2,i) should be small if treatment i is the optimal treatment for a patient with history 
/i2,i and there is more than one best treatment. 

The theoretical results presented for the binary treatment ACI, including those regarding 
the bootstrap of the upper and lower bounds, hold in the many treatment case as well. While 
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there is no qualitative change in the required assumptions, they must however be generahzed 
to accommodate an arbitrary number of treatments. The generahzed assumptions along with 
statements of the theorems in the many treatment case can be found in the supplementary 
material (Supplement Section 2). 

4 Empirical Study 

In this section we contrast different choices of the potentially important tuning parameter A„ 
and we provide an empirical evaluation of the ACL Nine generative models are used in these 
evaluations; each of these generative models has two stages of treatment and two treatments 
at each stage. Generically, each of the models can be described as follows: 

• {-1, 1}, A e {-1, 1} for i e {1, 2} 

. P(Ai = 1) = P(Ai = -1) = 0.5, P{A2 = 1) = P{A2 = -1) = 0.5 

• Xi ~ Bernoulh(0.5), X2\Xi,Ai ~ BernouUi(expit(5iXi + 62A1)) 

• n = 0, 

>2 = 7l + 72^1 + 73^1 + 74^1^1 + 75^2 + 76^2^ + 77^1^2 + €, 6 ~ A^(0, 1) 

where expit(x) — e^/ (1 + e^). This class is parameterized by nine values 71, 72, 77, 5i, ^2. 
The analysis model uses patient feature vectors defined by: 



H2,0 


= {l,X,,A,,X,A,,X2y 


H2,\ 


= (l,^2,Ai)T 




= (1,^1)^ 
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Our analysis models are given by (52(-f^2, ^2; P2) = 0^2,0+ 1^2,1^2 and Qi{Hi, Ai, ^1) = 
-f^i 0/^1,0 + Hljj3i^iAi. We use contrast encoding for Ai and A2 to allow for a comparison 
with Chakraborty et al. (2009). 

The form of this class of generative models is useful as it allows us to influence the 
degree of nonregularity present in our example problems through the choice of the 7^ and 
Si, and in turn evaluate performance in these different scenarios. Recall that in Q- learning, 
nonregularity occurs when more than one stage-two treatment produces nearly the same 
optimal expected reward for a set of patient histories that occur with positive probability. 
In the model class above, this occurs if the model generates histories for which 75A2 + 
76^2^2 + 77A1A2 0, i.e., if it generates histories for which Q2 depends weakly or not 
at all on A2. By manipulating the values of 7^ and 5i, we can control i) the probability of 
generating a patient history such that 75742 + 76-^2^2 + 77^1^2 = 0, and ii) the standardized 
effect size £'[(75+76X2+77^1) / •\/Var(75 + 76^2 + 77^1)] ■ Each of these quantities, denoted 
by p and respectively, can be thought of as measures of nonregularity. 

Table 1 provides the parameter settings; the first six settings were considered by Chakraborty 
et al. (2009), and are described by them as "nonregular" , "near-nonregular" , and "regular". 
To these six, we have added three additional examples labeled A, B, and C. Example A is 
an example of a strongly regular setting. Example B is an example of a nonregular setting 
where the nonregularity is strongly dependent on the stage 1 treatment action. In example 
B, for histories with Ai = 1, there is a moderate effect of A2 at the second stage. However, 
for histories with Ai = —1, there is no effect of A2 at the second stage, i.e., both actions 
at the second stage are equally optimal. In example C, for histories with Ai — 1, there is a 
moderate effect of A2, and for histories with Ai = —1, there is a small effect of A2. Thus 
example C is a 'near-nonregular' setting that behaves similarly to example B. 
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Example 


7 


S 


Type 


Regularity Measures 


1 


(0,0,0,0,0,0,0)T 


(0.5,0.5)T 


nonregular 


p = 1 





= 0/0 


2 


(0,0, 0,0, 0.01, 0,0)T 


(0.5,0.5)T 


near-nonregular 


p = 




= oo 


3 


(0,0, -0.5,0, 0.5, 0,0.5)T 


(0.5,0.5)T 


nonregular 


p=l/2 


<P 


= 1.0 


4 


(0,0, -0.5,0, 0.5, 0,0.49)T 


(0.5,0.5)T 


near-nonregular 


p = 




= 1.02 


5 


(0, 0, -0.5, 0, 1.0, 0.5, 0.5)T 


(1.0, 0.0)T 


nonregular 


p= 1/4 




= 1.41 


6 


(0,0, -0.5, 0,0.25, 0.5, 0.5)T 


(0.1,0.1)T 


regular 


p = 




= 0.35 


A 


(0,0, -0.25, 0,0.75, 0.5, 0.5)T 


(0.1,0.1)T 


regular 


p = 


= 


= 1.035 


B 


(0,0, 0,0, 0.25, 0,0.25)T 


(0,0)T 


nonregular 


p=l/2 


<P 


= 1.00 


C 


(0,0, 0,0, 0.25, 0,0.24)T 


(0,0)T 


near-nonregular 


p = 





= 1.03 



Table 1: Parameters indexing the example models. 



4.1 The choice of A„ 

We measure and compare the performance of four choices of the tuning parameter A„ in 
terms of estimated coverage and average interval diameter. The intervals are constructed for 
intercept and the coefficient of the treatment indicator in the first stage Q-function in the 
nine generative models. We use a training set size oin— 150 in order to mimic the sample 
size of the ADHD study (n = 138). The online supplement contains a number of additional 
examples and sample sizes all displaying similar trends as presented here (Supplement Part 
V). 

For the sequence An we consider the following settings: A„ = \/loglog n, log log n, log n, A/n, n. 
The intuition behind these settings is as follows. The supremum (infimum) used in the ACI 
can be thought of controlling the influence of committing a Type II error in the test of 
A/o(/i2,i) : ^2 1/^2 1 — 0- other hand, the Type I error is controlled by the choice 

of A„. Recall that we reject the hypothesis A/o(^2,i) when T(/i2,i) > A^. Thus, it is of 
interest to examine the (uniform) behaviour of T(/i2,i)/An across the set of /i2,i for which 
A/o(/i2,i) is true. Since the test statistic T is scale invariant (e.g. for any a > we have 
T{ah2^\) — T(/i2,i)) is suffices to restrict our attention to unit vectors /i2,i satisfying Ao(/i2,i)- 
We let W = {/i2,i e R'^'"'(^2.i) : /i2,i/^2,i = 0, ||/i2,i|| = 1} denote these vectors of interest. 
Provided that A^ tends to oo it follows that sup^gyy T(/i)/A„ — )■ in probability. Further- 
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more, if A„ grows faster than log log n then the above convergence can be strengthened 
from in probabihty to almost surely using the law of the iterated logarithm (see Csorgo 
and Rosalsky 2003). However, consistency of the ACI also requires that A„ = o{n). Thus, 
Xn — n represents a rate that is too fast for consistency to hold; = log n is fast enough for 
strong (almost sure) control of the Type I error; A„ = log log n represents a rate that is at the 
boundary between almost sure and in convergence in probability; A„ = v^xDglog^ represents 
a rate that only ensures convergence in probability; A„ = ^/n represents a non-logarithmic 
rate that meets the consistency condition. 

Tables (2) and (3) show the estimated coverage and interval diameter of the ACI across 
the five parameter settings for the nine generative models. The results appear stable across 
choices of A„ for which the ACI is consistent. However, the ACI becomes quite conservative 
when A„ is allowed to grow faster than log log n. Both in the simulation studies below as 
well as in the data analysis, we use A„ = log log n. 

4.2 An Evaluation of the ACI 

We compare the empirical performance of the ACI with the centered percentile bootstrap 
(CPB), the soft-thresholding (ST) method of Chakraborty et al. (2009), and the simple plug- 
in pretesting estimator (PPE). The hard-thresholding of Moodie and Richardson (2007) is 
similar in theory and performance to the soft-thresholding approach; furthermore in orthogo- 
nal settings the lasso type penalization of Song et al. (2010) is equivalent to soft-thresholding, 
and so, Chakraborty's method is used to represent these alternate approaches. 

The performance of each method is measured in terms of estimated coverage and interval 
diameter. We shall see that the ACI is conservative when there is no stage 2 treatment 
effect for all feature patterns; this is not unexpected since the ACI is based on the use of 
the upper/lower bounds. Despite the use of the bounds, ACI routinely delivers close to the 
nominal coverage and possesses competitive diameters. Competing methods fail to attain 
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Pill 


Ex. 1 


Ex. 2 


Ex. 3 


Ex. 4 


Ex. 5 


Ex. 6 


Ex. A 


Ex. B 


Ex. C 


= 


NR 


NNR 


NR 


NNR 


NR 


R 


R 


R 


R 


Vlog log n 


0.989 


0.987 


0.967 


0.969 


0.954 


0.952 


0.950 


0.962 


0.962 


log log n 


0.992 


0.992 


0.968 


0.972 


0.957 


0.955 


0.950 


0.964 


0.965 


\ogn 


0.993 


0.994 


0.975 


0.976 


0.962 


0.966 


0.959 


0.969 


0.972 


AT 


n QQ/1 


n QQi^ 
u.yyo 


u.y / 


n Q7P, 
u.y ( 


n QR7 

u.yo 1 


n Q70 

u.y ( z 


u.yoo 


u.y 1 o 


n Q7K 
u.y ( 


n 


n QQ/1 


ft QQf^ 

u.yyo 


u.y / 


u.y / D 


n QfiQ 
u.yoy 


n Q70 

u.y 1 z 


u.yoo 


n Q7K 

u.y / 


ft Q7fi 

u.y /D 


Pi 1 


Ex. 1 


Ex. 2 


Ex. 3 


Ex. 4 


Ex. 5 


Ex. 6 


Ex. A 


Ex. B 


Ex. C 


An = 


NR 


NNR 


NR 


NNR 


NR 


R 


R 


R 


R 


Vlog log n 


0.952 


0.962 


0.952 


0.954 


0.950 


0.953 


0.947 


0.952 


0.954 


log log n 


0.956 


0.964 


0.954 


0.955 


0.950 


0.957 


0.948 


0.956 


0.957 


logn 


0.970 


0.974 


0.961 


0.964 


0.950 


0.966 


0.959 


0.965 


0.968 


-\/n 


0.971 


0.975 


0.963 


0.968 


0.954 


0.973 


0.965 


0.974 


0.978 


n 


0.971 


0.975 


0.987 


0.987 


0.979 


0.980 


0.975 


0.983 


0.984 



Table 2: Monte Carlo estimates of coverage probabilities for the ACI methods at the 95% 
nominal level. Here, denotes the main effect of treatment and /3i,o,i denotes the in- 

tercept. Estimates are constructed using 1000 datasets of size 150 drawn from each model, 
and 1000 bootstraps drawn from each dataset. Estimates significantly below 0.95 at the 0.05 
level are marked with *. Models have two treatments at each of two stages. Examples are 
designated NR = nonregular, NNR = near-nonregular, R = regular. 



Pl.lA 


Ex. 1 


Ex. 2 


Ex. 3 


Ex, 4 


Ex. 5 


Ex. 6 


Ex. A 


Ex. B 


Ex. C 


Xn = 


NR 


NNR 


NR 


NNR 


NR 


R 


R 


R 


R 


^/log log n 


0.490 


0.490 


0.481 


0.481 


0.483 


0.471 


0.474 


0.484 


0.484 


log log n 


0.502 


0.502 


0.488 


0.488 


0.487 


0.475 


0.477 


0.491 


0.491 


logn 


0.557 


0.557 


0.518 


0.518 


0.503 


0.495 


0.492 


0.523 


0.523 


\fn 


0.583 


0.582 


0.533 


0.533 


0.513 


0.514 


0.511 


0.540 


0.540 


n 


0.586 


0.586 


0.538 


0.538 


0.525 


0.521 


0.519 


0.543 


0.543 


/3i,o,i 


Ex. 1 


Ex. 2 


Ex. 3 


Ex. 4 


Ex. 5 


Ex. 6 


Ex. A 


Ex. B 


Ex. C 


An = 


NR 


NNR 


NR 


NNR 


NR 


R 


R 


R 


R 


i/log log n 


0.506 


0.506 


0.481 


0.481 


0.483 


0.490 


0.474 


0.490 


0.490 


log log n 


0.518 


0.518 


0.487 


0.487 


0.486 


0.494 


0.476 


0.497 


0.498 


logn 


0.574 


0.574 


0.517 


0.517 


0.502 


0.517 


0.493 


0.540 


0.541 


\fn 


0.596 


0.596 


0.536 


0.536 


0.515 


0.543 


0.519 


0.571 


0.572 


n 


0.598 


0.598 


0.576 


0.576 


0.565 


0.586 


0.565 


0.579 


0.579 



Table 3: Monte Carlo estimates of mean width of the ACI method at the 95% nominal level. 
Here, Pixi denotes the main effect of treatment and Pi,o,i denotes the intercept. Estimates 
are constructed using 1000 datasets of size 150 drawn from each model, and 1000 bootstraps 
drawn from each dataset. Estimates with corresponding coverage significantly below 0.95 
at the 0.05 level are marked with *. Models have two treatments at each of two stages. 
Examples are designated NR = nonregular, NNR = near-nonregular, R = regular. 
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nominal coverage on many of the examples. 

Two additional alternative confidence sets, both proposed in Robins (2004), are the pro- 
jection confidence set and a second confidence set that is based on a preliminary confidence 
set as opposed to the pretest considered here. Both proposals take advantage of the fact that 
the construction of a locally efficient score test for any fixed value of the vector is 
possible. In the first method a joint confidence set for /32,i^) is constructed by inverting 
the locally efficient score test. Then this joint confidence set is projected to form a confi- 
dence set for c^Pl. Projection confidence intervals are generally conservative (Scheffe 1959; 
Nickerson 1994); that is, these confidence sets possess greater than the desired confidence 
level even when the problem is regular. As a result Robins proposes a second method that 
utilizes a "preliminary" confidence set. This preliminary 1 — e joint confidence set is for 
{^2,1^^ Pi) where the columns of C"*- are orthogonal to c and the matrix [c, C"*-] is of full 
rank with number of columns equal to the dimension of For example this preliminary 
confidence set might be a projection of the confidence set for i^)- Next assuming 

that (/52,iC'"'"/3i) is known to be (/32,i, C-^^i), a locally efficient score test can be constructed 
for any fixed value of c^/3i. For each value of {/32,i,C-^/3i) in the preliminary confidence set, 
the locally efficient score test (at level a) is inverted. That is, the 1 — a — e asymptotic 
level confidence set contains all values of c^^i for which the locally efficient score test would 
accept for at least one value of {/32,i,C-^/3i) in the preliminary confidence set. These two 
approaches take advantage of the fact that if i were known then inference for /3l would 
be regular (thus the existence of the locally efficient score test). 

To our knowledge neither method has been implemented (either in simulation or with 
data). Both pose difficult computational challenges that must be addressed prior to imple- 
mentation. Both are non convex optimization problems and the projection confidence set 
may be the union of disjoint sets. As a result these two proposals are not evaluated here 
(see, however, the discussion for further comments). 
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We now briefly describe the three methods that we compare with the ACL A natural 
flrst method to try is the bootstrap; thus the centered percentile bootstrap serves as a 
useful baseline for comparison. As discussed, the ST method works by shrinking the fitted 
regression /3i in the hopes of mitigating bias induced by nonregularity. In particular, for the 
working models we consider in this section; the ST estimators are: 

/3f^ = argminP„(y/^-S[^i)2 (15) 

In the above display, and S2i,2i are as described in previous sections. The constant 
3 appearing in the ST method is motivated by an empirical Bayes interpretation of the 
thresholding (see work by Chakraborty et al. (2009) for more details). The form of the 
ST method shows that the modified predicted future reward following the optimal policy is 
shrunk most heavily when /i2,i/^2,i is small. Which is to say, shrinkage occurs when there is 
little evidence that one treatment differs significantly from another for a patient with history 
H2,i = h2A- The ST method is only developed for binary treatment. 

The PPE confidence interval, in the two-stage binary treatment case, is formed by boot- 
strapping 

c^W, + cTErip,i?iU„l^(^^_^)>,^ + c^E^'Fr^B, [i/^VVn]^ lt(H,,o<A„- (17) 

This approach is natural as it partitions the data using a pretest and then uses a different 
estimator on each partition. A similar idea was employed by Chatterjee and Lahiri (2009) 
in their treatment of the Lasso. However, this approach is consistent under fixed but not 
local alternatives (see the supplemental material. Remark 1.9 and surrounding discussion for 
additional details); see also Leeb and Potscher 2005). As we will see below, this leads to 



ST 



Y, + Hl,k,o + \HlJ2,i\ 
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rather poor small sample performance. The primary reason for including this method is to 
motivate the importance of local alternatives and the utility of the supremum (infimum) in 
the construction of the ACL 

We first provide confidence intervals for the coefficient of Ai (the treatment variable), 

]^ Q in settings in which there are two or three treatments at stage 2. Note that given the 
working models and generative models defined by the parameter settings in Table 1, we can 
determine the exact value of any parameter c^P* of interest. The supplementary material 
(Supplement Section 4) contains confidence intervals for the treatment effect when Xi = 1 
(e.g. /3i,i,o + /^i,i,i)- addition, it contains estimated coverage probabilities and interval 
diameters for a range of sample sizes and a number of additional generative models, including 
those with three stages of treatment (Supplement Section 4). 

Table 4 shows the estimated coverage for the coefficient of Ai, /^i^i^i. This simulation 
uses a sample size of 150, a total of 1000 Monte Carlo replications and 1000 bootstrap 
samples. Target coverage is .95. The CPB and PPE methods fare the worst in terms 
of coverage, each falling significantly below nominal coverage on fourteen of the eighteen 
examples, respectively. The ST method fails to cover for examples A,B and C. The reason 
for this under performance is that the ST method tends to over-shrink when treatment effects 
are larger as is the case in all of these examples. Recall that the ST method has not been 
developed for the setting in which there are more than two treatments at the second stage. 
The ACT delivers nominal coverage on all of the eighteen examples. The ACT is conservative 
on examples one and two. The average interval diameters are shown in Table 5. The ACI is 
the most conservative as is to be expected given that it is based on upper and lower bounds. 
However, the width is non-trivial. 

The coefficient of Ai is perhaps most relevant from a clinical perspective. However, 
from a methodological point of view, other contrasts can be illuminating. Table 6 shows 
the estimated coverage for the intercept using the same generative models described in the 
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preceding paragraph. The coverage of competing methods is quite poor collectively attaining 
nominal coverage on two examples. Particularly disturbing is that the ST method falls more 
than 30% below nominal levels. In contrast, the ACI delivers nominal coverage on all but 
two examples. Table 7 shows the average interval widths; the ACI is the widest but again is 
non-trivial. 

5 Analysis of the ADHD study 

In this section we illustrate the use of the ACI on data from the Adaptive Interventions 
for Children with ADHD study (Nahum-Shani et al. 2010a). The ADHD data we use here 
consists oi n — 138 trajectories. These n — 138 trajectories are a subset of the original 

= 155 observations. This subset was formed by removing the N — n = 17 patients 
that were either never randomized to an initial (first stage) treatment (14 patients), or had 
massive item missingness (3 patients). A description of each of the variables is described in 
Table (8). Notice that the outcomes Yi and Y2 satisfy Yi + Y2 = R, where R is the teacher 
reported TIRS5 score at the last week of the study (week 32) . 

The first step in using Q-learning is to estimate a regression model for the second 
stage; this analysis only uses data from patients that were re-randomized during the 32 
week study. Of the n = 138 patients, 79 were re-randomized before the study conclusion. 
The feature vectors at the second stage are i?2,o — (1, -'^i,!, -^^1,2, -^^2,2, -^^1,3, -^^2,1, ^1)^ and 
H2,i = (l,X2,i, Ai)T. Thus, the Q-function Q2(^2, ^2; /92) = ^1^132,0 + Hl J32,iA2 contains 
an interaction term between the second stage action A2 and a patient's initial treatment 
Al, an interaction between A2 and adherence to their initial medication ^2,1, a main effect 
for ^2, and main effects for all the other terms. Table (9) provides the second stage least 
squares coefficients along with interval estimates. Examination of the residuals (not shown 
here) showed no obvious signs of model misspecification. In short, the linear model described 
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Table 4: Monte Carlo estimates of coverage probabilities of confidence intervals for the main 
effect of action, Pl^^ ai the 95% nominal level. Estimates are constructed using 1000 datasets 
of size 150 drawn from each model, and 1000 bootstraps drawn from each dataset. Estimates 
significantly below 0.95 at the 0.05 level are marked with *. There is no ST method when 
there are three treatments at Stage 2. Examples are designated NR = nonregular, NNR = 
near-nonregular, R = regular. 



Two txts 


Ex. 1 


Ex. 2 


Ex. 3 


Ex. 4 


Ex. 5 


Ex. 6 


Ex. A 


Ex. B 


Ex. C 


at stage 2 


NR 


NNR 


NR 


NNR 


NR 


R 


R 


NR 


NNR 


CPB 


0.385* 


0.385* 


0.430* 


0.430* 


0.457 


0.436* 


0.451 


0.428* 


0.428* 


PPE 


0.365* 


0.366 


0.419 


0.419 


0.452 


0.418* 


0.452* 


0.404* 


0.403* 


ST 


0.339 


0.339 


0.426 


0.427 


0.469 


0.436 


0.480* 


0.426* 


0.424* 


ACI 


0.502 


0.502 


0.488 


0.488 


0.487 


0.475 


0.477 


0.491 


0.491 


Three txts 


Ex. 1 


Ex. 2 


Ex. 3 


Ex. 4 


Ex. 5 


Ex. 6 


Ex. A 


Ex. B 


Ex. C 


at stage 2 


NR 


NNR 


NR 


NNR 


NR 


R 


R 


NR 


NNR 


CPB 


0.446* 


0.446 


0.518* 


0.518* 


0.567* 


0.518* 


0.557 


0.508* 


0.507* 


PPE 


0.415* 


0.415* 


0.500* 


0.500* 


0.557* 


0.486* 


0.548* 


0.467* 


0.465* 


ACI 


0.716 


0.716 


0.663 


0.663 


0.643 


0.643 


0.625 


0.673 


0.673 



Table 5: Monte Carlo estimates of the mean width of confidence intervals for the main effect 
of action /^J'^i^i at the 95% nominal level. Estimates are constructed using 1000 datasets of 
size 150 drawn from each model, and 1000 bootstraps drawn from each dataset. Models 
have two treatments at each of two stages. Widths with corresponding coverage significantly 
below nominal are marked with *. There is no ST method when there are three treatments at 
Stage 2. Examples are designated NR = nonregular, NNR = near-nonregular, R = regular. 
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Two txts 


Ex. 1 


Ex. 2 


Ex. 3 


Ex. 4 


Ex. 5 


Ex. 6 


Ex. A 


Ex. B 


Ex. C 


at stage 2 


NR 


NNR 


NR 


NNR 


NR 


R 


R 


NR 


NNR 


CPB 


0.892* 


0.908* 


0.924* 


0.925* 


0.940 


0.930* 


0.936 


0.925* 


0.931* 


PPE 


0.926* 


0.930* 


0.933* 


0.934* 


0.934* 


0.907* 


0.928* 


0.910* 


0.909* 


ST 


0.935* 


0.930* 


0.889* 


0.878* 


0.891* 


0.620* 


0.687* 


0.686* 


0.663* 


ACI 


0.956 


0.964 


0.954 


0.955 


0.950 


0.957 


0.948 


0.956 


0.957 



Table 6: Monte Carlo estimates of coverage probabilities of confidence intervals for the 
coefficient of the intercept, /3i o,i at the 95% nominal level. Estimates are constructed using 
1000 datasets of size 150 drawn from each model, and 1000 bootstraps drawn from each 
dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. Examples 
are designated NR = nonregular, NNR = near-nonregular, R = regular. 



Two txts 


Ex. 1 


Ex. 2 


Ex. 3 


Ex. 4 


Ex. 5 


Ex. 6 


Ex. A 


Ex. B 


Ex. C 


at stage 2 


NR 


NNR 


NR 


NNR 


NR 


R 


R 


NR 


NNR 


CPB 


0.404* 


0.404* 


0.430* 


0.429* 


0.457 


0.449* 


0.450 


0.428* 


0.428* 


PPE 


0.376* 


0.376* 


0.418* 


0.418* 


0.451* 


0.448* 


0.453* 


0.410* 


0.410* 


ST 


0.344* 


0.344* 


0.427* 


0.427* 


0.466* 


0.469* 


0.474* 


0.430* 


0.428* 


ACI 


0.518 


0.518 


0.487 


0.487 


0.486 


0.494 


0.476 


0.497 


0.498 



Tabic 7: Monte Carlo estimates of the mean width of confidence intervals for the coefficient of 
the intercept, /^i^o.i the 95% nominal level. Estimates are constructed using 1000 datasets 
of size 150 drawn from each model, and 1000 bootstraps drawn from each dataset. Models 
have two treatments at each of two stages. Widths with corresponding coverage significantly 
below nominal are marked with *. Examples are designated NR = nonregular, NNR = 
near-nonregular, R = regular. 
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^1,1 e [0,3] 
^1,2 e {0, 1} 
^1,3 e {0,1} 
Ae{-l,l} 

T G {6, 7,... 32} 

X2,i e {0, 1} 



^2,2 e {1,8} 
^2e{-l,l} 



Re{l,2,...,5} 



Y2 — R1t<32 



baseline teacher reported mean ADHD symptom score. Measured at 

the end of the school year preceding the study. 

indicator of a diagnosis of ODD (oppositional defiant disorder) at base- 
line, coded so that Xi 3 = corresponds to no such diagnosis, 
indicator of a patient's prior exposure to ADHD medication, coded so 
that Xi^2 — corresponds to no prior exposure. 

initial treatment, coded so that Ai = —1 corresponds to medication 
while Ai = 1 corresponds to behavioral modification therapy, 
right censored time in weeks until patient is re-randomized, 
first stage response (see definition of R below). 

indicator of patient's adherence to their initial treatment. Adherence 
is coded so that a value of X2,i = corresponds to low adherence 
(taking less than 100% of prescribed medication or attending less than 
75% of therapy sessions) while a value of X2,i = 1 corresponds to high 
adherence. 

month of non-response. 

second stage treatment, coded so that A2 = —1 corresponds to aug- 
menting the initial treatment with the treatment not received initially, 
and A2 = I corresponds to enhancing (increasing the dosage of) the 
initial treatment. 

teacher reported Teacher Impairment Rating Scale (TIRS5) item score 
32 weeks after initial randomization to treatment (Fabiano et al. 2006). 
The TIRS5 is coded so that higher values correspond to better clinical 
outcomes. 

second stage outcome. 



Table 8: Components of a single trajectory in the ADHD study. 

above seems to fit the data reasonably well. 

Recall that the dependent variable in the first stage regression model is the predicted 
future outcome = li -|- maxa2e{-i,i} Q2{H2, 02; $2)- Since the predictors used in the first 
stage must predate the assignment of first treatment, the available predictors in Table (8) are 
baseline ADHD symptoms 1, diagnosis of ODD at baseline 2, indicator of a patient's 
prior exposure to ADHD medication ^1^3, and first stage treatment Ai. The feature vectors 
for the second stage are Hifl = (1, Xi^i, Xi^2, ^1,3) and Hi^i = (1, Xi^a), so that the first stage 
Q-function Qi{Hi,Ai; Pi) = HlgPi^ + Hj^^Pi^iAi contains an interaction term between the 
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Term 



Cocff. Estimate Lower (5%) Upper (95%) 



Intercept 


/^2,0,l 


1.36 


0.48 


2.26 


Baseline symptoms 




0.94 


0.48 


1.39 


ODD diagnosis 


1^2,0,3 


0.92 


0.46 


1.41 


Month of non-response 




0.02 


-0.20 


0.20 


Prior Medication 


/^2fi,5 


-0.27 


-0.77 


0.21 


Adherence 


/32,0,6 


0.17 


-0.28 


0.66 


First stage txt 


^2,0,7 


0.03 


-0.18 


0.23 


Second stage txt 


^2,1,1 


-0.72 


-1.13 


-0.35 


Second stage txt : Adherence 


^2,1,2 


0.97 


0.48 


1.52 


Second stage txt : First stage txt 


^2,1,3 


0.05 


-0.17 


0.27 



Table 9: Least squares coefficients and 90% interval estimates for second stage regression. 

first stage action Ai and a patient's prior exposure to ADHD medication Xi^^, a main effect 
for Ai, and main effects for all other covariates. The first stage regression coefficients are 
estimated using least squares Pi = argmin^iP„(ri - Qi{Hi, Ai, Table (10) provides 

the least squares coefficients along with interval estimates formed using the ACL Plots of the 
residuals for this model (not shown here) show no obvious signs of model misspecification. 
Again a linear model seems to provide a reasonable approximation to the Q-function in the 
first stage. 

Term CocfT. Estimate Lower (5%) Upper (95%) 



Intercept 


Pl,0,l 


2.61 


2.09 


3.08 


Baseline symptoms 


Pl,0,2 


0.72 


0.46 


1.01 


ODD diagnosis 


A,0,3 


0.75 


0.38 


1.07 


Prior med. exposure 


PiflA 


-0.37 


-0.82 


0.01 


Initial txt 




0.17 


-0.05 


0.39 


Initial txt : Prior med. exposure 




-0.32 


-0.60 


-0.05 



Table 10: Least squares coefficients and 90% ACI interval estimates for first stage regression. 

To construct an estimate of the optimal DTR, recall that for any Ht = hf, t = 1,2 the 
estimated optimal DTR tt = (tti, 7^2) satisfies 7tt{ht) e arg max^^ Q{ht, at, jit)- The coefficients 
in Table (9) and the form of the second stage Q-function reveal that the second stage decision 
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rule 712 is quite simple. In particular, 7^2 prescribes treatment enhancement to patients 
with high adherence to their initial medication and it prescribes treatment augmentation to 
patients with low adherence to their initial medication. The first stage decision rule tti is 
equally simplistic. The coefficients in Table (10) show that the first stage decision rule, tti 
prescribes medication to patients who have had prior exposure to medication, and behavioral 
modification to patients who have not had any such prior exposure. 

The prescriptions given by the estimated optimal DTR tt are excessively decisive. That is, 
they recommend one and only one treatment regardless of the amount of evidence in the data 
to support that the recommended treatment is in fact optimal. When there is insufficient 
evidence to recommend a single treatment as best for a given patient history, it is preferred to 
leave the choice of treatment to the clinician. This allows the clinician to recommend treat- 
ment based on cost, local availability, patient individual preference, and clinical experience. 
One way to assess if there is sufficient evidence to recommend a unique optimal treatment for 
a patient is to construct a confidence interval for the predicted difference in mean response 
across treatments. In the case of binary treatments, for a fixed patient history Ht = ht, one 
would construct a confidence interval for the difference Qt{ht, 1; — Qt{hi, —1; /3^) — 
where c = (0^, /i]^ If this confidence interval contains zero then one would conclude that 
there is insufficient evidence at the nominal level for a unique best treatment. 

In this example, the patient features that interact with treatment are categorical. Conse- 
quently, we can construct confidence intervals for the predicted difference in mean response 
across treatments for every possible patient history. These confidence intervals are given in 
table (11). The 90% confidence intervals suggest that there is insufficient evidence at the 
first stage to recommend a unique best treatment for each patient history. Rather, we would 
prefer not to make a strong recommendation at stage one, and leave treatment choice solely 
at the discretion of the clinician. Conversely, in the second stage, the 90% confidence inter- 
vals suggest that there is evidence to recommend a unique best treatment when a patient had 
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low adherence — knowledge that is important for evidence-based clinical decision making. 



Stage 


History 


Contrast 
for /3t,i 


Lower (5%) 


Upper (95%) 


Conclusion 


1 


Had prior med. 


(1 1) 


-0.48 


0.16 


Insufficient evidence 


1 


No prior med. 


(1 0) 


-0.05 


0.39 


Insufficient evidence 


2 


High adherence 
and BMOD 


(111) 


-0.08 


0.69 


Insufficient evidence 


2 


Low adherence 
and BMOD 


(10 1) 


-1.10 


-0.28 


Sufficient evidence 


2 


High adherence 
and MEDS 


(1 1 -1) 


-0.18 


0.62 


Insufficient evidence 


2 


Low adherence 
and MEDS 


(10-1) 


-1.25 


-0.29 


Sufficient evidence 



Table 11: Confidence intervals for the predicted difference in mean response across treatments 
across different patient histories at the 90% level. Confidence intervals that contain zero 
indicate insufficient evidence for recommending a unique best treatment. 



6 Discussion 

The task of constructing valid confidence intervals for the parameters in the Q-function is 
both scientifically important and statistically challenging. In this paper we offer a first step 
toward conducting inference in DTRs that is valid under local alternatives, computationally 
efficient, and easy to apply. The method presented here provides asymptotically valid inter- 
vals regardless of the true configuration of underlying parameters (3* or the joint distribution 
of patient histories Ht for t = 1,2, ... ,T. Theoretical guarantees were supported by a suite 
of test examples in which the ACI performed favorably to competitors. The ACI is conserva- 
tive when all of the coefficients of terms involving the second stage treatment are zero. It is 
our experience that cfi^orts to reduce this conservatism negatively impacts the performance 
of the resulting confidence interval for other generative models; we conjecture that this con- 
servatism can not be ameliorated without negatively impacting the overall performance of 
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the confidence interval. 

Robins (2004) second proposal for a confidence set (see Section 4.2 above for a de- 
scription relevant to this setting), also adapts to regularity as is the case with the ACL 
A critical difference between the ACI and Robins' second proposal is the way in which the 
pretest /preliminary confidence set is employed. Intuitively both are used to restrict the pro- 
jection to a smaller set. However the ACI uses the pretest to bound only the nonsmooth 
parts of the statistic (the estimator of the stage 2 parameters are plugged-in the smooth 
parts of the statistic), whereas Robins' second proposal uses the preliminary confidence set 
to bound both the smooth and nonsmooth parts of the score statistic. This argues in favor 
of the ACI. On the other hand, in Robins' second proposal the confidence set is a union of 
acceptance regions and the preliminary confidence set is used to restrict this union whereas 
the ACI can be viewed as the acceptance region for a supremum statistic. This argues in 
favor of Robins' proposal. It would be most interesting to develop approximate algorithms 
that facilitate the use of Robins' second confidence set and then to compare the resulting 
approximate confidence set with the ACI. 

There are a number of further avenues for work on this problem; we conclude by iden- 
tifying three of the most interesting. The first is extending the ACI to problems where 
parameters are shared across stages. This setting occurs when a patient's status is modeled 
as series of renewals (as is often assumed in settings with a very large number of stages) 
or when smoothness across stages is assumed. The second area of interest is the so-called 
"large p small n" paradigm where the number of predictors in the Q-function far exceeds 
the number of observations. This setting arises, for example, when a patient's genetic in- 
formation might be used to tailor treatment. A complication to the question of inference in 
this setting is that it is preceded by the more fundamental question of how one should even 
build Q-functions when p ^ n, but variable selection methods used in one-stage regression 
are likely to find use in the multi-stage case as well. Penalized estimation and Q-learning 
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in one-stage decision problems are discussed in (Qian and Murphy 2009) and in multi-stage 
problems in (Song et al., 2010). The third area of interest concerns reducing the bias in the 
estimation of the stage 1 treatment effect (recall that if the stage 2 effect is zero for a some 
patient features then the bias is of order 1 / ^/n) . The most promising work in this area seems 
to be that of Song et al., (2010). This work produces nonregular estimators; it would be 
most interesting to develop confidence intervals that are valid under local alternatives even 
though the estimators are nonregular. 

References 

Donald W. Andrews and Gustavo Soares. Inference for Parameters Defined by Moment 
Inequahties Using Generahzed Moment Selection. SSRN eLibrary, 2007. 

D.W.K. Andrews. Testing when a parameter is on the boundary of the maintained hypoth- 
esis. Econometrica, 69(3):683-734, 2001. 

D.W.K. Andrews. Generalized method of moments estimation when a parameter is on a 
boundary. Journal of Business and Economic Statistics, 20(4):530-544, 2002. 

D.W.K. Andrews and P. Guggenberger. Incorrect asymptotic size of subsampling procedures 
based on post-consistent model selection estimators. Journal of Econometrics, 152(1):19- 
27, 2009. 

D.W.K. Andrews and W. Ploberger. Optimal tests when a nuisance parameter is present 
only under the alternative. Econometrica: Journal of the Econometric Society, pages 
1383-1414, 1994. 

R. Beran. Diagnosing bootstrap success. Annals of the Institute of Statistical Mathematics, 
49(l):l-24, 1997. 



35 



PJ Bickel, AJ Klaassen, Y. Ritov, and JA Wellner. Efficient and adaptive inference in 
semi-parametric models. Johns Hopkins University Press, Baltimore, 1993. 

D. Blatt, Susan A. Murphy, and J. Zhu. A-learning for approximate planning. Technical 
Report 04-63, The Methodology Center, Pennsylvania State University, 2004. 

B. Chakraborty, S. Murphy, and V. Strccher. Inference for non- regular parameters in optimal 
dynamic treatment regimes. Statistical Methods in Medical Research, 19(3), 2009. 

A Chatterjee and S.N. Lahiri. Bootstrapping Lasso estimators. Journal of the American 
Statistical Association, 106(494) :608-625, June 2011. 

Jeesen Chen. Notes on the bias-variance trade-off phenomenon. A Festschrift for Herman 
Rubin: Institute of Mathematical Statistics, 45:207-217, 2004. 

Xu Cheng. Robust confidence intervals in nonlinear regression under weak identification. 

Job Market Paper, 2008. 

Sandor Csorgo and Andrew Rosalsky. A survey of limit laws for bootstrapped sums. Inter- 
national Journal of Mathematics and Mathematical Statistics, 45:2835-2861, 2003. 

Donald W.K. Donald. Testing when a parameter is on the boundary of the maintained 
hypothesis. Econometrica, 69:683-734, 2001. 

Hani Doss and Jayaram Sethuraman. The price of bias reduction when there is no unbiased 
estimate. Annals of Statistics, 17(l):440-442, 1989. 

RB DAgostino. Departures from normality, tests for. Encyclopedia of Statistical Sciences, 
2:315-324, 1982. 

G.A. Fabiano, W.E. Pelham, D.A. Waschbusch, E.M. Gnagy, B.B. Lahey, A.M. Chronis, and 
L. Burrows-MacLean. A practical measure of impairment: Psychometric properties of the 

36 



impairment rating scale in samples of children with attention deficit hyperactivity disorder 
and two school-based samples. Journal of Clinical Child and Adolescent Psychology, 35: 
369-385, 2006. 

R. Henderson, P. Ansell, and D. Alshibani. Regret- Regression for Optimal Dynamic Treat- 
ment Regimes. Biometrics, 66(4), 2009. 

Keisuke Hirano and Jack Porter. Impossibility results for nondifferentiable functionals. Mpra 
paper. University Library of Munich, Germany, 2009. URL http://econpapers.repec. 
org/RePEc :pra:mprapa: 15990. 

Michael R. Kosorok. Introduction to empirical processes and semiparametric inference. 
Springer, 2008. 

P.W. Lavori and R. Dawson. A design for testing clinical strategies: biased adaptive within- 
subject randomization. Journal of the Royal Statistical Society: Series A (Statistics in 
Society), 163(1) :29-38, 2000. 

H. Leeb and B.M. Poetscher. The finite-sample distribution of post-model-selection estima- 
tors and uniform versus nonuniform approximations. Econometric Theory, 19(1): 100-142, 
2003. 

H. Leeb and B.M. Potscher. Model selection and inference: Facts and fiction. Econometric 
Theory, 21(l):21-59, 2005. 

J.A. Lieberman, T.S. Stroup, J.P. McEvoy, M.S. Swartz, R.A. Rosenheck, D.O. Perkins, 
R.S.E. Keefe, S.M. Davis, C.E. Davis, B.D. Lebowitz, et al. Effectiveness of antipsychotic 
drugs in patients with chronic schizophrenia. The New England journal of medicine, 353 
(12):1209, 2005. 



37 



R.C. Liu and L.D. Brown. Nonexistence of informative unbiased estimators in singular 
problems. The Annals of Statistics, pages 1-13, 1993a. 

Richard C. Liu and Lawrence D. Brown. Nonexistence of informative unbiased estimators 
in singular problems. Annals of Statistics, 21(1):1-13, 1993b. 

E.E.M. Moodie, T.S. Richardson, and D.A. Stephens. Demystifying optimal dynamic treat- 
ment regimes. Biometrics, 63(2) :44 7-455, 2007. 

E.E.M. Moodie, T.S. Richardson, and D.A. Stephens. Estimating optimal dynamic regimes: 
Correcting bias under the null. Biometrics, 63(2):447-455, 2010. 

S.A. Murphy and L.M. Collins. Customizing treatment to the patient: adaptive treatment 
strategies. Drug and alcohol dependence, 88(Suppl 2):S1, 2007. 

Susan A. Murphy. Optimal dynamic treatment regimes. Journal of the Royal Statistical 
Society, Series B, 65(2):331-366, 2003. 

Susan A. Murphy. A generalization error for q-learning. Journal of Machine Learning 
Research, 6:1073-1097, Jul 2005. 

I. Nahum-Shani, M. Qian, D. Almirall, W.E. Pelham, B. Gnagy, G. Fabiano, J. Waxmonsky, 
J. Yu, and S.A. Murphy. Experimental design and primary data analysis methods for 
comparing adaptive interventions. Technical Report 10-108, The Methodology Center, 
The Pennsylvania State University, 2010a. 

I. Nahum-Shani, M. Qian, D. Almirall, W.E. Pelham, B. Gnagy, G. Fabiano, J. Waxmonsky, 
J. Yu, and S.A. Murphy. Q-learning: a data analysis method for constructing adaptive 
interventions. Technical Report 10-107, The Methodology Center, The Pennsylvania State 
University, 2010b. 



38 



David M. Nickerson. Construction of a conservative region from projections of an exact 
confidence region in multiple linear regression. The American Statistician, 48 (2): 120-124, 
1994. 

R.A. Olshen. The conditional level of the F-test. Journal of the American Statistical Asso- 
ciation, 68(343) :692-698, 1973. 

W.E. Pelham Jr and G.A. Fabiano. Evidence-based psychosocial treatments for attention- 
deficit/hyperactivity disorder. Journal of Clinical Child & Adolescent Psychology, 37(1): 
184-214, 2008. 

S. Phszka. AACAP Work Group on Quality Issues. Practice parameter for the assessment 
and treatment of children and adolescents with attention-deficit/hyperactivity disorder. J 
Am Acad Child Adolesc Psychiatry, 46(7):894-921, 2007. 

B.M. Potscher. Effects of model selection on inference. Econometric Theory, 7(2): 163-185, 
1991. 

Min Qian and Susan A. Murphy. Performance Guarantees for Individualized Treatment 
Rules. Technical Report 498, Department of Statistics, University of Michigan, 2009. 

J.M. Robins. Optimal structural nested models for optimal sequential decisions. In Pro- 
ceedings of the Second Seattle Symposium in Biostatistics: Analysis of Correlated Data, 
2004. 

A.J. Rush, M. Trivedi, and M. Fava. Depression, IV: STAR* D treatment trial for depression. 
American Journal of Psychiatry, 160(2) :237, 2003. 

H. Scheffe. The Analysis of Variance. John Wiley, New York, 1959. 

J. Shao. Bootstrap sample size in nonregular cases. Proceedings of the American Mathemat- 
ical Society, 122(4):1251-1262, 1994. 

39 



R Song, W.. Wang, D. Zeng, and M. Kosorok. Penalized q-learning for dynamic treatment 
regimes. Technical Report arXiv:1108.5338vl, arxiv.org, 2011. 

A.A. Tsiatis. Semiparametric theory and missing data. Springer Verlag, 2006. 

C.J.C.H. Watkins and R Dayan. Q-learning. Machine learning, 8(3):279-292, 1992. 

Jr. William E. Pelham, Gregory Fabiano, James Waxmonsky, Andrew Greiner, Martin Hoff- 
man, Susan Murphy, E. Michael Foster, Jihnhee Yu, Elizabeth Gnagy, Ira Bhatia, Jessica 
Verley, and Kate Tresco. Adaptive pharmacological and behavioral treatments for chil- 
dren with adhd: Sequencing, combining, and escalating doses. Institute of Educational 
Sciences' Third Annual Research Conference, Washington, DC, 2008. 

H. Working and H. Hotelling. Application of the theory of error to the interpretation of 
trends. Journal of the American Statistical Association, 25 (165): 73-85, 1929. 

Y. Zhao, M.R. Kosorok, and D. Zeng. Reinforcement learning design for cancer clinical 
trials. Statistics in Medicine, 28:3294-3315, 2009. 



40 



Statistical Inference in Dynamic Treatment Regimes 

Online Supplement 

Eric B. Laber, Daniel J. Lizotte, Min Qian, and Susan A. Murphy 

November 10, 2011 



Contents 

1 Proofs of results stated in the main body of the eirticle 3 

1.1 Results for second stage parameters 3 

1.2 A characterization of the first stage coefficients and the upper bound U{c) . 8 

1.3 Note on computation 17 

2 Extension of the ACI to more than two stages and more than two treat- 
ments 17 

2.1 Q- Learning in the general case 18 

2.2 ACI in the general case 20 

2.2.1 Example: ACI for three stages 23 

2.3 Properties of the ACI in the general case 24 

2.4 Sketched proofs for the ACI in three stages and more than two treatments . 28 

3 Bias reduction for non-reguleir problems 43 



1 



4 Additional empirical results 47 

4.1 Varying dataset size 47 

4.2 Models with ternary actions 60 

4.3 Models with three stages 68 



2 



1 Proofs of results stated in the main body of the ar- 
ticle 



Throughout Sections 1 and 2, let X denote a sufficiently large positive constant that may 
vary from line to line. Let Dp denote the space oi px p symmetric positive definite matrices 
equipped with the spectral norm, and for any k e (0, 1), let denote the subset of Dp 
with members having eigenvalues in the range [k,l/k]. For any class of real- valued functions 
J", let pp{f) = {P{f - Pff f^ denote the centered Ls-norm on r{T) denote the 
space of uniformly bounded real-valued functions on T equipped with the sup norm, and 
Cb{T) denote the subspace of of continuous and bounded functions from T into R, 

respectively. Furthermore, let G„ = ^/n{Fn — P), Gn^ = ^/n{Fn^ — P„), and Pm denote 
probability taken with respect to the bootstrap weights defining the bootstrap empirical 
measure, respectively. 

1.1 Results for second stage parameters 

In this section we will characterize the limiting distributions of the second stage param- 
eters under fixed and local alternatives. We will also derive the limiting distribution of 
the bootstrap analog of the second stage parameters. For convenience, let pto = dim(/3(*o), 
Pti = dim(/3j*i), and pt = dmi{/3^ ) = pto + pti for i = 1, 2. 

Theorem 1.1. Assume (Al) and (A2) and fix a e R^^, then 

1. a^^02 - PI) aTE^-^Zoo, 

2. a'^y/n02'^ — P2) '^Pm ^^^2,00^00 P -probability; and 

3. if in addition (A4) holds, a^^/n02 — ^2,71) '^Pn '^^^2^io^oo; 

where is a mean zero normal random vector with covariance matrix P[B2Bl{Y2 — Bll32y]. 
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Proof. Define the class of functions T2 as 



•^2 = {/(62, 2/2; a, /32) = 0^62(2/2-6^/32) : a,/32 eM^% \\a\\ < K, \\P2\\<K}, (1) 

and the function W2 : Vp^ x Z°°(7"2) x R^^ x R^^ R as 

W2(S,/x,/32,a) ^ /i (aTE-^52(F2 - Bl/S^)) . (2) 

Since the estimated covariance matrices E2 = P„i^2-62 and Eg''"' = Pn ■'52-62 are weakly con- 
sistent (by Lemma 1.3), we will avoid additional notation by assuming they are nonsingular 
for all n without loss of generality. Thus 

a'V^02 - PI) = W2{%, G„, PI a), V^(/3f - P2) = W2{tf , G^''), /32, a), 

and v^(/32 - P^^) = W2(E2, \/^(Pn - i'n), Z^^, n, «)• 

In addition, note that a^H'^^'Loo — W2(E2, Goo, /32) o), where Goo is a tight Gaussian process 
in 1°°{T2) with covariance function Cov(Goo/i, G00/2) = -P(/i — -P/i)(/2 — -P/2)- Results 1 
and 3 follow from Lemmas 1.2 - 1.5 and the continuous mapping theorem (Theorem 1.3.6 of 
van der Vaart and Wellner 1996). Result 2 follows from the bootstrap continuous mapping 
theorem (Theorem 10.8 of Kosorok 2008) together with Lemmas 1.2 - 1.6. □ 

Lemma 1.2. Under (Al), the function W2 defined in (2) is continuous at points in Dp^ x 
Cb{T) X X W\ 

Proof Let e > be arbitrary and let (E, n, p2, a) be an element of Dp^ x Cb{T) x R^'^ x R^'^. 
In addition, let (E', /i', /32, a') be an clement of Dp^ x x x W\ From the form of 

F and the moment assumptions in (Al) we see that if E — E', a — a', and (32 — P2 are small 
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then so must pp{f — f) be small, where 

f{B2,Y2) = a'J:-'B2iY2-Bll32), 
f{B2,Y2) = a'J:'-'B2{Y2-Bl/3'2). 

In particular, we can choose 6 > sufficiently small so that 1 1 S — S' 1 1 + 1 1 a — a' 1 1 + 1 1 — 1 1 < ^ 
implies that pp(/ — /') is small enough to guarantee, by appeal to the continuity of /i, that 
\Kf) - l^{f)\ < e/2. Finally, note that 

|w2(S,/x,/32,a) - W2(S',/x',/32,a')| < Hf) + 11/^-/^11^2- 

Let 5' = min(5, e/2), then ||E - E'|| + ||// - + ||^2 - ^211 + ||a - a'|| < S' implies that 
1^2(2, /i, ^2, a) — ■w^2(S', /x', a') I < e. Thus, the desired result is proved. □ 

Having established the continuity of W2 the next step will be to characterize the limiting 
behavior of /32,„, ^2, ^2, t^2 \ ^^^1 the limiting distributions of ^/n(Pn—Pn)■, and ^/n{¥n ^ — 
Fn). These limits are established in a series of lemmas. Once this has been accomplished we 
will be able to apply the continuous mapping theorem to obtain the limiting distributions 

of ^02 - f3*2), V^0 - f3*2,n), and ^(/^f - ^2). Define ^2,n = PnB2Bl 

Lemma 1.3. Assume (Al)-(A2), then E2 -^p 5^2 (^nd 1^2^ -^Pm ^2 in P-probability as 
n ^ 00. Furthermore, if (A4) holds, then E2 ^p„ E2 as n ^ 00. 

Proof. The ffist two claims follow from weak law of large numbers (Bickel and Frccdman 
1981; Csorgo and Rosalsky 2003). For the third claim, note that E2 - E2 = (E2 - E2,n) + 
(E2,„ — E2) and E2 — "^2,71 ~^Pn by law of large numbers. Below we show that E2,„ — >■ E2. 
This will complete the proof. 

let c e be arbitrary and define u = c^B2Blc. We will show that / u{dPn — dP) — o{l). 
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First, note that 

Furthermore, the absolute value of the foregoing expression is bounded above by 

W\\{dP^/' + dp'/^)\{dp^/^ - dp'/^) < ^ J u^dPl'^ + dp^i^y^ J {dPl'^ - dP'/^)\ 

where the last inequality is simply Holder's inequality. Next, note that owing to the inequality 
{yfa + Vhf < 2a + 26 it follows that 

j v\dPl'^ + dP^'^f <2 j v^dPn + 2 J v^dP = 0(1), 

by appeal to ( A4) . Now write 



2 

1/2' 



-\j,^dP.V-nj.dPndPl'^-dP-^) 
The right hand side of the preceding display is equal to 

0{l/n) + j gdpy^(dP^/^ - dP^/^) < 0{l/n) + n-^/^^Z J ^2^py^ J (^pV2 _ ^pi/2)2^ 

which is o(l). Thus ^2,n -^^2- □ 
Lemma 1.4. Under (Al) and (A2), ^2 -^p as n ^ 00. If, in addition (A4) holds, then 

lim,_oo V^(^2*n - ^2) = ^2'ooP9B2{Y2 - Bl/3*) . 

Proof. $2 ^2 follows from weak law of large numbers and Slutsky's lemma. 
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Recall that = PnB2{Y2 — -Bl/^l which we can write as 

0i(P„ - P)B2iY2 - BlPl) - ^2,nMf3; - 

so that for sufficiently large n it follows that ^/n{/3l^ - /3^) = E^^^y^{Pn - P)B2{Y2 - Bl/3^). 
By appeal to (A4) it follows that for any vector a e MP'^ we have sup„ Pn{a'' B2{Y2- Bl/32)y < 
oo. Theorem 3.10.12 of van der Vaart and Wellner (1996) ensures that 

V^(P„ - P)B2{Y2 - BIP*) ^ PgB2{Y2 - Blp*) 

as n — )■ CX3. This completes the proof. □ 

Lemma 1.5. Assume (A1)-(A2), then 

1) G„ -^p Goo i°°{J'2), where T2 is defined in (1), and Goo is a tight Gaussian process 
in l°°{J-2} with covariance function Coi'(Goo/i, G00/2) = P{fi ~ -P/i)(/2 — Pf2); o-nd 

2) sup^eBLi |EAfC^(v^(Pi^^ - Pn)) - IEa;(Goo)| ^p* in 1^{J^2). 
If, in addition (A4) holds, then 

3) V^(P„ - Pn) -wp„ Goo in l'=°{T2). 

Proof. First note that J^2 is a subset of the pairwise product of the linear classes {a^h2 : a e 
R^2} and {y2 — ^2/^2 : P & W^} each of which is VC-subgraph of index no more than p2 + 1 
and P-measurable. Under (Al), the envelope of J^2, P2(P2,>^2) = -ft'||P2||(|>2| + K\\B2\\), is 
square integrable. This implies that J-2 is P-Donsker, and 1) follows immediately. 2) follows 
from Theorem 3.6.1 of van der Vaart and Wellner (1996). For 3), note that from (A4) it 
follows that supj |P„/| is a bounded sequence. The result follows from theorem 3.10.12 of 
van der Vaart and Wellner (1996). □ 

Lemma 1.6. The space Cb{J-'2) is a closed subset of l°°{J-'2) and P(Goo € Cb{J-'2)) = 1. 
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Proof. Let {/^n}^i be a convergent sequence of elements in Cb(J^2) and /lo the limiting 
element. Fir the first claim, we only need to show that ||yU'o||j'2 = Ia*o(/)| is bounded, 

and for any f & J- and e > 0, there exists some positive S depending on / so that |/io(/') ~ 
/^o(/)| < e for all /' e and pp{f , f) < 5. The boundedness argument follows by noticing 
that ||/io||j'2 — ll/"n||j'2 + Wl^n — I^o\\t2 for s-iiy ^) ill particular, for some fixed large enough 
^) IIa*ti||.F2 is bounded by the fact /i„ e Cb{J^2), and — is bounded above by a 

constant due to the convergence of to /iq- For continuity, note that since converges to 
fj,o, we can choose n* so that | — /Uo| | < e/4 for all n > n*. In addition, by the continuity 
of /In*, there exists some 5 > so that \i-in*{f') — l^n*{f)\ < e for all pp{f',f) < S. Thus 

< 2||/io - /^n*||jP-2 + |/^n*(/) - 

< 3e/4. 

This implies that Cb{J-') is closed. 

Next note that Goo is a tight Gaussian process in 1°°{J^2)- By the argument in section 1.5 
of van de Vaart and Wellner (1996), almost all sample paths / — > Goo(/, uj) are uniformly p2- 

continuous, where p2(/i, /2) = [-^(^00/1 — ^00/2)^]^^^ is a semimetric on Since P2(/i, /2) = 
[Var{fi — /2)]^^^ < Pp(/i) f2), the continuity of the sample paths of Goo follows immediately. 

□ 

1.2 A characterization of the first stage coefficients and the upper 
bound U{c) 

In this section we present the proofs for Theorem 2.1 and 2.2. We first derive an expansion 
for the first stage coefficients and two useful expansions of the upper bound U{c). The terms 
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in the forementioned expansions will be treated individually in subsequent sections. We will 
make use of the following functions. 



1. wu ■■ Dp^ X L'pixpao X l°°{J^n) X X x W^+p^ ^ M is defined as 



where Dp^xpQo is the space of pi x p2o matrices equipped with the spectral norm, and 
^11 = [f{bi,yi,h2,o,h2,i) = aI&i(yi+/i2,o/52,o+[/i2,i/?2,i]+-&I/?i)+a2^'i(^2,ii^i)lftT^/3|,i>o, 

K). 

2. wi2 : Dp^ X l°°{J^i2) X W^^ X M^^i ^ m is defined as 

w^2{^l,^^, ^, 7) = [c^Sr'^i ( + ^1,17] + - [^^2^7] +) lif,V^5,,=o] , (4) 

where J'ls = /i2,i) = a^fei ([/i^^z/ + /i^^i7]+ - [/i^,i7]+) hl,0l,=o ■.aeRP\-f,ue 

MP2i^niax{||a||, ||i/||} < /^j. 



3. pii : Dpi X D^^^ X X x W^^ x M^^i x M ^ M, is defined 



as 



pii(Ei, E2i,2i, i^, V, 7, A) = 



x 1 



^2 1^21,21*f2,l 



, (5) 



where J^ii = |/(&i,/i2,i) = 0^61 ([/i^^^i/ - /i^_i7]+ - [/i5,i7]+) (1 (4^i-+^£i^^^-l/»J,i/32*,i=o) 

''2,1^21,2l'»2,l ~ 

a e e Mf2Smax{||a||, < /T, A e M, E2i,2i G D^^^}. 
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4. pi2 : Dp^ X l'^{Ti2) X R*'2i X R^^i _^ defined as 



Pl2(Ei,/X,i/,77) = /X 



, (6) 



+ 



where J'la = |aTfoi( + /i5,i77]+-[^2,i^]+-^2,i^)l/»^,i/3|,i>o-«^^i( [^2,1^^ + ^2,1^] 

1/^2,1^]+) l/^,V^,l<o : aeRPSi/eRf-,max{||a||,||z/||}<i^,7?GMf-}. 

Using the foregoing functions, we have the following expressions for the first stage parameters: 

c'^fnC^-Pl) = ^/;n(Ei,Ei2,G„,P„,V^(^2-/32*),(/3r,/32*')') 

+ «;i2(El,Pn,A/^(;52,l -^2*i),a/^/32*,i) 

+ Pi2(Sl, Pn, 0^02,1 - /32,l)' V^/52,l); (7) 
V^(^1-/?1*J = ^ll(Sl,El2,A^(Pn-Pn),Pn,V^(^2-/32%),(^r,;,/?2,'n)0 
+ W,2{t^, P„, V^(/32,i - /32V,n), V^/32,l,n) 

+ Pl2(Sl,Pn, V^(^2,l - /32V,n), V^^2,1,J, (8) 

where S12 = Pn-Bii?2,o- Similarly, we can express the upper bound U{c) in terms of the 
above functions: 



U{c) = w;n(Si,Si2,G„,P„,v^(/32-/32*),(/3r,/32*')') 

+ Pi2(Sl,Pn, V^(/32,l -/32V), V^/32V) 

- Pll(Sl, ^21,21, Pn, Vn02,l - /32* 1), V^/32* 1, V^/32,l> ^n) 
+ sup <^ Wi2(El,P„, V^(/32,l -/32i),7) 

+ pii(Si,S2i,2i,P„, V^(/32,i -/32V), V^/32*,i'7,A„)|. (9) 
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We will also make use of the following alternative expression for the upper bound U{c) under 

Pn-. 

U{C) = ^n(El,El2,V^(Pn-Pn),Pn,V^(/32-/32%),(/3i*;,/3;;^ 

+ pi2(Sl,P„, Vn{(52,l - /32,l,n)! V^/52*l,n) 

- S21,21,Pn, Vn{kl - /32*,l,n)> V^/32,l,n> V^/32,l,n> ^n) 

+ sup <^ Wi2(Sl,Pn,yn(^2,l -/32,l,n)>7) 
76RP21 l_ 




Similarly, we will make use of following expression for the bootstrap analog of the upper 
bound: 




Below we argue that pn and pi2 are negligible and w\\ and w\-2 are continuous in such 
a fashion so as to facilitate the use of continuous mapping theorems as presented in the 
previous section. 

Analysis of Error Terms pn and pi2 

In this section, we show that the functions pn and pi2 in the expressions for the first stage 
parameters and upper bounds are negligible. The function pn is more difficult to handle so 
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we address it here and omit the proof involving pi2 as it involves similar ideas. 
Theorem 1.7. Assume (Al)-(A3). Then 

1. sup^gKP2i |pii(Si,S2i,2i,Pn, V^(/^2,i -/32,i), V^/3^,i,7,A„)| ^>p 0, and 

2. sup^g]Rf2i |pii(sf\E2i,2i>lP'n\ V^(/325 -/52,i), V^/32,i,7,An)| Pm dmost surely P. 
If, in addition, we assume (A4), then 

3. SUp^gKP2i |pii(Si,S21,21,Pn,\/ri(^2,l -/32,l,n)> V^/52,l,n>7,An)| 0. 

Proof. First it is easy to verify that \[Hl-^^iy — i/2,i7]+ — [i?2,i7] + | < |^2,i'^l- Thus for any 
probability measure /i in l°°{J-u), 



|pii(Ei,E2i,2i,/i,z^,r?,7, A)| < i^<^ ||5i|| ||i72,i|| 1 , i^li- 

L y ^2,1^2,1-'^' ||H2,1 



+ /X ||5i||||i/2,l||l 



^2V^2*,l=0,p|^<-VAfe-/f 

+ it I \\Bi\\ ||-ff2l|| 1 , hT 

\ ' ^2,i/'2*.i7^o-\AA-^<rii^<VVfc+i^^ 

for a sufficiently large constant K > and a sufficiently small constant k G (0, 1). Since k is 
held constant there is no loss in generality taking k = 1. Define p'^ : xM^^^ xMxM ^ 

M as 



p[^{p,r),6,5') = p \\Bi\\ \\H2,i\\ 1 



+ 11 \\Bi\\ ||-ff2,l|| 1 ^ «T 



+ // ||5i||||//2,l||l ^ «T^. , (12) 
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where = </(6i,/i2,i) = ||6i|| ||/i2,i||l ^ + H^ill |||/i2,i||l n^.r, + 

\M ||/i2,i||l ^ .T ,r^e Rf-,max{||?7||, < k]. Then 

/iT_i/3J_^^0,5'<||^<-5' J 

|pii(Ei,E2i,2i,/^,i^,77,7,A)| < (^/x,?7/^, (V^-X)/Vri, -(V^ + i^)/V^) 
for e In particular for n sufficiently large, 

|pn(sf\sS,2i,Pi'\ v^4i,7,A„)| < 

Kp[, (P(f\ ^2,1, ( - ^)/v^, -( - K)/^^ 

+ \\c\\ -/32,i)iipf {\m \\H,,\\) 

where we have assumed, without loss of generality, that S2i''2i ^^e identity matrix. By 
part 2 of Lemma 1.8 below, we see that the first term on the right hand side of the above 
display is opj^{l) almost surely P. To deal with the second term, for any e, 5 > 0, let K 
sufficiently large so that Pm ^| \ \/^02j — ^2,1) 1 1 > < 6 for sufficiently large n for almost 
all sequences P. Then 

Pm (||c|| - /32,i)||Pi^)||5i|| > e) 

< Pm {WV^Wi'} - P2,i)\\ > k) < 5, 

almost surely P. This completes the proof of result 2. Similar arguments can be used to 
prove results 1 and 3, and are omitted. □ 

Lemma 1.8. Let be defined in (12). Assume (A1)-(A3), then 

1. p'n(Pn,;52*i, (VA; - K)/^, {-y/Tn - K) / ^) 0, and 

2. p[^{¥n\ ^2,1, (VA; - K)/y/^, (-VA; - K)/y/^) 0, P-almost surely. 
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If, in addition, we assume (A4), then 

Proof. The class J-[i is P-Donsker and measurable by Theorem 8.14 in Anthony and Bartlett 
(2002) and Donkser preservation results (for example, see Theorem 2.10.6 in van der Vaart 
and Wellner 1996). Note that by (Al) and (A4) sup^g^-/^ jP/^l < oo and sup^g^-,^ |^n/^| 
is a bounded sequence. Thus, it follows that (i) ||P„ — P|| ^0 almost surely under P in 
I'^iTn), (ii) ||P^''^ - P|| almost surely Pm for almost all sequences P (Lemma 3.6.16 
in van der Vaart and Wellner 1996), and (iii) ||P„ — P„|| — > almost surely under P„ in 
l°°{J^u) (Theorem 3.10.12 in van der Vaart and Wellner 1996). Additionally, the argument 
in the proof of Lemma (1.3) shows that Ei is convergent to Ei under P„, and the weak law 
of large numbers establishes convergence under P. The bootstrap strong law shows that Tif"* 
converges to Ei in Pm probability for almost all sequences P. 

Next we show that is continuous at the point {P,p2,i^ 0; 0)- Let //„—>■ P in 
Vn ^2,1^ 0, and 6'^ 0. We have 

|pn(/^n,?7n,<^n,0 -Pn(^,/32,i'0'0)| < \p'i^{P,r]n, 5'^) - p'^^{P, + \W-P\\, 

which converges to zero by the dominated convergence theorem. The results follow from the 
continuous mapping theorems and the fact that p']^^(P, /32^j^, 0, 0) = 0. □ 

Continuity of wu and Wi2 

To prove Theorems 2.1 and 2.2, we need to show that wn is continuous at points in 

(Sl,oo,Sl2,oo,C^,(•7"ll),P,Mf^(^t^/3^)^), «;i2 (•,-,•, v^/^sV) and «;i2 (•,-,•, v^/^l.i.J con- 
tinuous at points in (Ei^oo, -P, IR^'^O' ^^d w'i2ij^i-, — sup^gj^pji '?^i2(Ei, /x, i^, 7) is continu- 
ous at points in (Ei^oo, -P, I^^^O- 
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To prove the desired continuity of Wi2 and w'12, we will establish the stronger result that 
W12 is continuous at points (Si 00, -P, K^^S 7) uniformly in 7. That is, for any S„ — )■ Si 
probability measures ^ P and !/„ ^ 1/, we have 



sup 

7 



Wi2(S„, 7) - Wi2(Si, P, U, 7) 



^ 0. 



Note that 



|W12(S„, IJ,n, l^n, 7) - ^*^12(Si, P, U, 7) | 

< |wi2(S„, 7) - Wi2(S„, I/, 7) I + |wi2(S„, P, I/, 7) - Wi2(Si, P, I/, 7) I 

+ |wi2(S„, i/, 7) - Wi2(S„, P, i/, 7) I 

< {\c^E-'B^\Hl,{,,^ - + P (|cT(S;i - S^^)Pi| \Hl,u\) 



+ 



By (A2), we have that ||S~^|| is bounded above for sufficiently large n, where || • || of a 
matrix denotes the spectral norm of the matrix. Thus the first term in the above display is 
bounded by ||c|| | |S~^| |Pi| | ||i72,i||) \\j^n — ^\\ = and the second term in the above 
display is bounded by ||c| | | |S]^^ — S~"^| | P(| |Pi| | | |i72,i| |)| | = o(l). For the third term, note 
that if ||i/|| = 0, then it is zero. Otherwise, 



- P) (cTS„iPi([if2\ii. + Hl.^U - [^I,i7]+)l//,V/3,v=o) 

- p) [c^j:-'b,([hIu/\\u\\ + Hi^/MU - [//2WII^II]+)i^^.vi,=o) 

< Wl^n - P\\ti2\H\ = 0(1). 



< 



This established the continuity of Wi2 and hence w[2- The continuity of Wu can be established 
through similar arguments and is therefore omitted. 
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Remcirk 1.9 (Plug-in Pretesting Approach). A natural approach to constructing a confi- 
dence interval in a non-regular problem is "a plug-in pretesting approach. " This approach, 
is similar in spirit to the ACI in that it partitions the training data using a series of hy- 
pothesis tests and uses different approximations on each partition. In particular, the plug-in 
pretesting estimator of c'^y/n0i —(31) is given by 

cTW„ + cTEr^P„S[U„l^(^^_^)>,^ + c^t^'F^Bj [Hl.W,,] ^ If (^,_,)<,„. (13) 

Confidence intervals are formed by bootstrapping this estimator. Under fixed alternatives, the 
plug-in pretesting estimator (PPE) is consistent. This consistency is established by recalling 
that If i)<A„ ~^ ^hl i/3| 1=0 uniformly over sets of probability near one (see the next section). 

However intuitive, the PPE does not perform well in small samples under some gener- 
ative models (see the main body of the paper and the last section of this supplement). One 
explanation for this underperformance is that the PPE is not consistent under local alter- 
natives. In particular, under a local generative model as described in (A4), it can be shown 
that the difference between the PPE and ^/n{(3i — /3i „) is equal to 

c-^t-,'F^Bl[[Hl,{N^;:)-rl)]^-[Hl^^^ (14) 

which is does not vanish for any alternative 7 for which H^yy is not identically zero with 
probability one. 

The expression in (I4) offers yet another view of the ACI. In particular, one can view 
the last term ofU{c) as approximating the supremum over local alternatives of the difference 
between the PPE and the target c^^/n0l — /3i,j). In this way, the ACI can be thought of 
as a corrected version of the PPE where the correction is intended to safeguard against poor 
small sample performance. 
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1.3 Note on computation 

The time complexity of computing the ACI bounds for a bootstrap sample is not much larger 
than the 0{n^) time required to compute the QR decomposition of the design matrix for 
least-squares hnear regression. The most computationally expensive addition to the standard 
bootstrap is the approximation of the supremum in the third term of equation (9) in the 
main body of the paper (here after (11)), and its corresponding infimum for the lower bound. 

In order to approximate the supremum in (11), we use a simple stochastic search. Since 
the intuition is that we want to take the supremum over a region near -\/^/^2,i, we draw 
candidate vectors 7''^, 7^, ...,7""^ uniformly from within a large set centered at ^2,1- We then 
evaluate the supremum objective in the third term of (11) at each of the 7*. Similarly, we 
use the minimum to approximate the infimum in the lower bound. In our experiments, we 
used — 1000. The complexity of evaluating the supremum objective once is n x Kf, so 
the additional time added to the standard bootstrap procedure is 0{n^ ■ n ■ Kt). 

2 Extension of the ACI to more than two stages and 
more than two treatments 

In this appendix, wc develop the ACI for the general case where there is an arbitrary finite 
number of stages of treatment, and an arbitrary finite number of treatment choices at each 
stage. We begin with a review of the Q-learning procedure in this setting. 
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2.1 Q-Learning in the general case 

We observe an i.i.d. sample of trajectories {Ti}2=i drawn from a fixed but unknown distri- 
bution P. Each trajectory is of the form 

r = (Xi, A,, Y,, X2, A2, Y2,..., Xt, At, Yt), (15) 

being comprised of patient measurements Xf, assigned treatment A^, and observed re- 
sponse Yt for t = 1,2,...,T. For each decision point t the assigned treatment At is 
coded to take values in the set {1, 2, ... , Kt}. As in the two-stage setting, we let Ht de- 
note a concise summary of patient history at time t. More precisely, Hi = ^i(Xi) and 
Ht = '^t{Xi,Ai,Yi,..., Xt_i,At-i, Yt^i, Xt) for i = 2, 3, . . . , T for known functions The 
form of the working model for the Q-function is of the same form as in Section 3 of the main 
body of the paper. For each t we use the model 

Qt{ht, at; A) = Ki^'^t=A^h (16) 
1=1 

where /3t = {Pt,iT Pt,2T ■ ■ ■> Pt,Kty ^ '^^^ form of the foregoing model is to produce 

compact theoretical expressions. In practice, any coding that makes the model identifiable 
can be used. Note, however, that the form of the pretest will depend on the coding. As in the 
two stage setting, the form of the working model implies that when hl iPt^i — maxj^i hJ iPtj ~ 
for some 1 < i < Kt, then at least two treatments are approximately optimal for a patient 
with history Ht^i = ht^i- That is, there is not a unique best treatment for a patient with 
history Ht^i — ht,i. On the other hand, if \hli/3t,i — meiX-j^i hlif3tj\ » for all 1 < i < Kt, 
then the working model implies that exactly one treatment is best for a patient with history 
Ht,i = ht^i- Once a working model has been specified, the Q- learning algorithm can be 
apphed to estimate the optimal DTR. The Q- learning algorithm is a follows: 
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1. Regress Yp on Ht and At using the working model (16) to obtain: 

/3t = argminP, (Ft - Qt{Ht, Ar, /St))^ 

PT 

and subsequently the approximation Qrihr, clt] $t) to the conditional mean Qrihr, clt)- 

2. (a) Recursively, define the predicted future reward following the optimal policy as: 



Yt = Yt+ max Qt+i (Ht+i,at+i; Pt+i) 

at+ie{l,2,...,Kt+i} \ / 



l<l<Kt+i 



for i = T-l,r-2,...,l. 

(b) Regress Yt on Ht and At using the working model (16) to obtain 
Pt = argmin^,P„ (ft - Qt{Ht, A; ^t))^ . 

3. Define the estimated optimal DTR tt = (tti, 712, . . . , ttt) so that 



7rt(/it) = arg max Qt{ht,at; Pt)- 

ate{l,2,...,Kt} 



When T — 2 the above procedure is equivalent to the two stage Q-learning algorithm given 
in Section 3 of the main body of the paper. 

Our aim is to use the ACI to construct a confidence interval for c^^l where c is an 
arbitrary vector of constants. The definition of Pi is given inductively. Define 



^ arg min P {Yt - Qt{Ht, Ar, /3t))^ 

PT 
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For i = T - 1, T - 2, . . . , 1 define 



l<i<Kt+i 

4 argminP (Y*-Qt{H„ A; Pt))' . 



At times we will focus on the problem of constructing a confidence interval for a linear 
combination of the first stage coefficients /3* since building a confidence interval for, say cJ/S^, 
is equivalent to building a confidence interval for the first stage of a T — i + 1 stage trial. 
That is, one can always view the tth stage as the first stage of a shorter T — t + 1 stage trial. 
Information collected prior to the tth stage can be treated as baseline (pre-randomization) 
information in this conceptual shorter trial. 



2.2 ACI in the general case 

The ACI in the general case is conceptually the same as the two stage case. Non-regularity in 
y/n0t— Pt) a-rises whenever there are two or more equally best treatments at any future stage 
of treatment s > t for a non-null subset of patient histories. The ACI works by constructing 
smooth upper and lower bounds on \/n(A ~ l^t) then bootstrapping these bounds to 
construct confidence intervals. As in the two stage case, these bounds are asymptotically 
equivalent to taking the supremum (infimum) over all local alternatives to the true generative 
distribution. 

In order to develop the ACI in this general setting, we generalize the notation given in 
the main body of the paper. Define Bt = {Hj-^lAt=\, ■ ■ ■ , HjilAt=KtV that instances of 
Bt form the columns of the design matrix used in the tth stage regression. Further, define 

= FnBfBj. The limiting distribution of \/n0t — Pt) depends abruptly on the frequency 
of patients for which there are multiple equally optimal best treatments at a future stage. 
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Consequently, the set 



A*(^M) = arg max (17) 



of equally optimal treatments at stage t for a patient with history Ht^i = ht^i, is relevant 
for the development of asymptotic theory Notice that Al{ht^i) is a singleton when there is 
exactly one best treatment for a patient with history /i^ i. As in the two stage case, we will 
need an estimator of Al{ht,i). The estimator we use is based on the following test statistics: 

n(hlJt,i-maxj^ihlJt,j) 

TtAKi) ^ -^-TT (18) 

'it,iit,irit,i 

where (t,i is the usual plug-in estimator of nCo'v0t,i — Pt,k) evaluated ect k — ki, where 
ki — arg maxj^j h]if3tj (we are acting as if the maximizing index were known a priori. 
In particular, if is the empirical mean square error of the tth stage regression, then 
we use atFnBtBj as our estimate of the asymptotic covariance of Thus, (t,i is given by 
Tit,ii — -|-Ej j^,^, where f^tji is the submatrix of atS'nBtBj corresponding to the estimator 
of Cov{f3tj, ^t,i)-) The statistic, miuj Tt^i{htj), should be large when there is exactly one best 
treatment for a patient with history Ht^i — ht^i- On the other hand, miuj Ti^i(/it_i) should be 
small if there is more than one best treatment. Thus, a natural estimator ol Al{ht,i) is 



|i : ft^i{ht,i) < A„| if min^ f;,i(/it,i) < 
argmaxi<j<Kt if mmift,i{ht,i) > Xn- 



The merits and genesis of this statistic were discussed in the main body of the paper. Under 
the regularity conditions given in the next section, it follows that At{ht^i) is a consistent 
estimator of Al{ht^i). In a slight abuse of notation, define At{ht^i) = At{ht^i) U>^i(^«,i) be 
the union of the estimator and the estimand. 

Recall that we denote ^ y^0t - and Vt,„,i ^ v^(A,i - A%) for i = 1, . . . , T. 
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For any 7*+! e Rf*+i and r^+i = (^J+,nI+2, ■ ■ ■ , 7t)' e RELt+iP^, t = 1, . . . , T - 1, define 



VT-l,n(rT) = ^'T-l,n + ^T-l^nBT-lUT,nl#AHHr,,)=l 



max i?r^i(Vr,n,i + 7r,i) - max i^^^iTr,, 
ie^T(-ffT,i) ' «e^T(-ffT,i) 



'-#^t(J^t,i)>1' 



and for t < T - 1, 



je-4t+i(-fft+i,i) 

+ ±;^FnBt Ut+i,„ - _ max Hj^i^^Yt+i,n,i i#At+i{H,+, i)=i 
+ E-^F^Bt( _max ( Vt+i,„,i(rt+2) + 7t+M ) - _max i^tVi,i7t+i,> 

Vie»4t+i{i^t+i,i) ^ ieAt+i{Ht+i,i) 



X 1 



#^t+l(i?t+l,l)>l' 



where 



W' 



t.n 



Ei'V^F^Bt + . max i/,Vi,iAVi, - i?*/?;) 

l_ l=l,...,Kt+i J 



Utn = \/n \ max Hj^Bti— max Hj^B^, 



for t = 1, 2, . . . , T - 1. Then Yt,n{iV^P*tli^ • • • > V^/^t )') = - A*) for t = 1, . . . , T - 1. 

The upper bound on cl^Jn0t — Pt) ^^6*^ construct a confidence interval for cj/?^* is given by 
Ut{ct) = sup j-T_ p Vj,„(rt+i). Similarly, the lower bound is obtained by replacing 
the sup with an inf. 
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2.2.1 Example: ACI for three stages 

To illustrate the ACI for the general case and solidify the ideas presented in the preceding 
section, we provide the bounds for the case where there are three stages of treatment and an 
arbitrary number of treatments at each stage. Thus, T = 3 and V3,„ — v^(/^3 ~ t^s)- Since 
Vs^n is the usual least squares estimator, it follows under (Al)-(A2) (see below) that V3,„ is 
regular and its limiting distribution is normal. The process V2,n(73) indexed by 73 e R^^ is 
defined as follows 



max (V3,„,j + 73,^) - max i?J^i73,j 



l#^(i?3,l)>l" 



An ACI for cl(32 is formed using the bootstrap distribution of bounds W2 (02) = sup^ggi^pg C2V2,n(73) 
and >C2(c2) = inf^ggRPg c^V2,„(73). 

To form a confidence interval for the first stage coefficients, e.g. c{/3jf , we use the process 
Vi,n((72,73)) indexed by (72,73) ^ M^'^+ps, xhe definition of Vi,n((72, 73)) is given by 



Vi,„((72,73))=W;,„ + Er'PnSi max Hl,Y2,n,iW#A.iH, ,)=i 

+ Er'P„Si(u2,„- max Hl,Y2,nAl#A,,iH,,)=i 
+ Sr^P„5i max (V2,„,i(73) +72,i) - max /^aVs-M ^#^(^^2 i)>i- 

\ieA2{H2,i) V ^ ieA2{H2,i) y 2^ 2.1^ 

Thus, the upper and lower bounds used for constructing a confidence interval for c[f3* are 
given by Wi(ci) = sup^^^^p^ ^^^^.^ps clVi,„((72, 73)) and£i(ci) = inf^^GKfz.TsGK^s c}Vi,„((72, 73)). 
The form of ¥2,^(73) and Vi,„((72, 73)) show that computing the bounds Wi(ci) and £i(ci) 
require optimizing piecewise linear objective functions. Since these piecewise linear objec- 
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tives are non-convex (non-concave) the resultant optimization problem is, to the best of our 
knowledge, a mixed integer program. A simple stochastic approximation is given in Section 
1.4 of this supplement. 

2.3 Properties of the ACI in the general case 

In this section, we state the general case analogs of the theorems given in the main body 
of the paper. In particular, these results state that the ACI provides asymptotically valid 
confidence intervals under mild regularity conditions. In addition, under further assumptions, 
it can be shown that the ACI dehvers asymptotically exact coverage. 
We will make the following assumptions. 

(Al) The histories Ht, features Bf, and outcomes 1^, satisfy the moment inequalities 

P||i/t||2 < ooforalU = 2,3,...,T, and PY^ < ooforallt = l,2,...,r. 

(A2) Define: 

1. Et,oo = PBtBj- 

2. ^t(Pt,1t;/3^) = Bt{Yt - Blp*^)- 

3. gt{B,, F„ i/i+i; A*) ^ B, {y, + max^^^.^, Hli,iPUi,k ' BjPt) ; 

then the matrices Sj^oq foi' ^ = ■ ■ ■ iT — 1^ and Vt = Cov {{gl-, gl, ■ ■ ■ , g-rY) are strictly 
positive definite. 

(A3) The sequence A„ tends to infinity and satisfies A„ = o(n). 

(A4) There exists a sequence of local alternatives P„ converging to P in the sense that 
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for some measurable real- valued function g for which P„||i?t|p is a bounded 

sequence for i = 2, 3, . . . , T, and PnYt^ 1 1-^*1 P is a bounded sequence for t = 1, 2, . . . , T. 

These assumptions are quite mild requiring the kind of moment and coUinearity constraints 
which are often encountered in multiple regression. Assumption (A3) concerns a user- 
controlled parameter and is thus readily satisfied. Assumption (A4) implies that 

Pin = arg niin P„(F,+ max A+ 1, - i^tW, for t = 1, 2, . . . , T - 1, 

satisfies hm„_^oo v^(A*n ~ Pt) = 7t* fo^ some 7^* e R^*, i = 1, . . . , T". 

Let Vf^oo(rt+i) denote the limiting process of Yt^ni^t+i) indexed by r^+i G M^fc=*+i^'=, 
denote the limiting distribution of „, and Vt,oo denote the limiting distribution 
of Vt,„ = \/n0t — Pt)- Let Vt,^*,oo denote the hmiting distribution of \/n0t — Pt,n) foi" 
t = 1, . . . ,T. In particular, note that Vt,7*,oo = ^t,oo, which does not depend on 7^. Under 
local alternatives P„, the analog of At{ht^i) is defined as At,n{ht,i) — A.t{ht^i) \J Af n{ht,i) , 
where 

argmaxi=i,...,;^, h\^l3l^^i if min^ Tt,„,i(/it,i) > A„ 

Tt,n,i{ht,i) = n{hl^Pl^^i - maxj^j hl^^l^^f / {hl^C,t,i,ooht,i) and Cm,oo is the limit of Cm for 
i = 1, . . . , Kf. We have the following theorem. 

Theorem 2.1 (Validity of Population Bounds). Assume (A1)-(A3) and fix Ct G R'^^'^^'^*). 
1. The limiting distribution c]^/ri0t — ^l) is given by the distribution of 



ieyit+i(ift+i,i) 



A*,n(/^t,l) 



/ori = r-l,...,l. 
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2. If, in addition (A4) holds, then the limiting distribution cly/n{i3t — P^^n) given by the 
distribution of 



l#^,%i(i?«+i.i)>i 



/ort = r-l,...,l. 

5. The limiting distribution 1At-i{ct-i) under P or Pn is given by the distribution of 

+ sup cJr_i^T-i,ooPBT-i[ . max H^^i{YT,oo,i+lT,i)- , max H^,i'yT,i)l#A'^{Hr,i)>i; 

and fort < T—1, the limiting distribution ofUt{ct) under P or Pn is given (recursively) 
by the distribution of 

c?W; + sup IcjE-^PBt max Hj^^^Yt+i,oo,ii^t+2)'i-#A;+,iHt,i)=i 

max Hj^i^^ ( Vt+i,oo,i(rt+2) + Jt+i,i) 



ieA^y 



When T — 2, these hmiting distributions are equal in law to the limiting distributions of 
U{c) and jC{c) given in Section 2 of the main body of the paper (after appropriate recoding). 
The preceding theorem shows that the hmiting distribution of Ut{ct) is stochastically larger 
than that of cl^/n0t — Pt)- ^ similar result can be stated in terms of a lower bound Ct{(h) 
by replacing the sup by an inf in the preceding theorem. The theorem is proved recursively 
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using the results proved for the two stage case and then repeatedly invoking the continuous 
mapping theorem. 

In order to form a confidence interval, the bootstrap distributions oi Ut{ct) and Ct{ct), 
which we denote by uf'\ct) and Cf\ct), are used. The next result states that the bootstrap 
bounds are asymptotically consistent. 

Theorem 2.2. Assume (Al)-(A3) and fix Ct e R^''^^^*) for t = 1,...,T - 1. Then 

{Ut{ct) , Ct{ct)) and {U^'^\ct) , Cf\ct)) converge to the same limiting distribution in proba- 
bility for all t. That is, 



fort=l,...,T-l. 

Corollary 2.3. Assume (A1)-(A3) and fix Ct G M^imC/?*) for t = 1, . . . ,T - 1. Let u denote 
the 1 — a/2 quantile ofUf'\ct) and 1 denote the a/2 quantile of Cf\ct). Then for all t 



all s — t + l,t + 2, . . . ,T , then the above inequality can be strengthened to equality. 

The preceding result shows that the ACT can be used to construct valid confidence intervals 
regardless of the underlying parameters or generative model. In addition, when there is 
almost always a unique best treatment, then the ACI delivers asymptotically exact confidence 
intervals. 



i/eBLi(R^) 



sup Eu{{Ut{ct),Ct{ct)))-EMi^ (Ur{ct),£.'i\ct))) -^p* 





Furthermore, for a given value oft if P 



mir^i=i,...,K. \Hl,(3l, - max,-^, Hl^^^J = O) = for 
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2.4 Sketched proofs for the ACI in three stages and more than 
two treatments 

In this section, we provide proof Sketches for Theorems 2.1 and 2.2 for T = 3. Resuhs for 
T > 3 can be proved in a similar fashion and are omitted. Note that (t,i used in (18) is 
for normalization purpose and its value does not affect the asymptotic results. Although 
by definition 4,i depends on ht^i, this dependence is only through and thus takes a finite 
number (at most Kt — 1) of values. Thus throughout this section we assume without loss of 
generality that Ct,i does not depend on ht^i and is positive definite for all n. 
Stage 3. For any C3 G M^^ define the function W3 : Vp^ x /"^(Ja) x M^^ M as 

where J3 = {/(63,l/3) = a^b,{ys - : a,/3se max{| |a| |, | j/^sl |} < K}. Then 

and clV^0s - = ^3(^3, V^(Pn - Pn),Pln)- 

The limiting behavior of the above quantities can be obtained using the same arguments as 
those in Theorem 1.1. 

Stage 2. For a given positive integer J, let r2j(MP) denote the set of functions MP — >■ 
{{1}, . . . , {J}, {1, 2}, {1, 3}, . . . , {1, J}, . . . , {1, . . . , J}}. For any a G W^, define the follow- 
ing functions. 
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1. W2,i : Dp^ X l^{j^2,i) X /°°(^2,i) X X Rp^+p^ ^ R is defined as 



W2,i(E2,6^,Ai,I/3,/3) = 



l=l,...,K3 



^2^2 (. max i/J ii/3,i)l#^.(//3,i)=i 



where J"2,i = [fih, Vs, ^3,1) = a^&2(y2 + maxj=i,...,;f3 hl ^ps^i - blpi) + a''^62l#^*(fe3,i)=i 

(max,e^.(,3,,)/i5,iZ/3,.) : a, a' G M^'^/3 = (Z?!,/?!)^ e Rp^+^^ps = Ki, • • • , ^aVs)' ^ 
Mf3,niax{||a||,||a'||,||/3||,||z/3||}<i^}. 



2. W2,2 : X /°°(^2,2) X X ^ M is defined as 



^"2,2(^2,//, 1^3, 73) = 



4^2 ( . max [Hl ^v^i^i + ^^3^173,^) 
yie^3(-H3,i) 



where ^2,2 = |/(&2,/i3,i) = a^&2(maXi6^*(/,3_,)(/iT ^1/3,^+/^^ ^73,i)-maXie^j(/,3_,) /ij i^3,i) 
l#^*(?.3,i)>i : « e R^'^73, i^3 e R^^max{||a||, II1/3II} < i^}- 

3. p2,o : X l°°{^2,o) X Oi^3(RP3/^3) X R^^ x R^^ ^ R is defined as 



X 



P2,o(S2,/i,^3,i^3,73) = 



c^S2^52 max (i/JiZ/3,j + i/J^73,i) - mjDC i/3^,i73,, 
' ieA3{H3^i) teA3{H3^i) 



wlicrc J^2,o = {/(&2,/i3,i) = a'^b2{maXi^^^(^h3,i){^,i^3,i+hl^ij3,i)-^i^^iGA3i^^^^^^ 
R^'3,max{||a||,||i/3||} < i^,^3 e Qi^3(K^'/'^')} 

and note that P3/K3 is the dimension 

of/3* for^ = l,...,ir3. 
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4. p2,i : Dp, X DJ^/^^ X ... X Dl^ij,^ x Z°°(J-2,i) x ^kM'"'''') x x x Rf^ x R ^ 
is defined as 



P2,l(S2, Cs.l, ■ ■ ■ , C3,/f3> -^3, i^3, ?73, 73, A) = /i 



4E2^52( max f//3V^3,i + ^^3,173,0 

\ ieyl3(-H3,l) 

wfiere 7^2,1 = 1/(^)1, /is.i) = (maxig^3(ft3^) (/i^,ii/3,i + ^3,i73,i) - ^^^-^eAsih^,!) ^3,i73,i) 

X (l . [^l^'3,i+''5l'73,^-max,.#i(^.5l.3J+hJ,^3,,)]2 " l#^|(h3,l)>l ) ^ « ^ R^^ ^^3 , ?73 , 73 ^ 

minj '- -J — - — '- '- <-A / 

''3,l''3,i'»3,l 

R^'3,max{||a||,||i/3||} <K,A^e nKM'"'''')^^ %Cz.i e ^^3/^3 i = h...,K^y 
5. p2,2 : Dp, X l°°{^2,2) X Rf3 X Rp^ R, defined as 



P2,2(S2,/i, t^3,?73) = 



c^S2^52 , max {Hl^us^i + Hl^r]3,i) 

\ «=1,...,K3 

- . max Hl^r]3,^ - max HI^U3M#auh3,,)=i 

wfiere J^2,2 = {/(^2, /i3,i) = a^^2(niaxj=i,...,K3 (^3,i^3,i + hl ir]3,i) -niaxj=i,...,/f3 hl ^ria^i- 
^^ieAUh3,i)hl^i^3,i)'i-#AUh3,i)=i ■ ^3,V3 ^ , max{| |a| | , | ji/g 1 1} < x|. 

Define Vc, : r>p2 x i?j3/;^3 x . . . x D^^^j,^ x /°°(J-2,i) x 1^{T2) x Q,^3(Rf3/^3) x RP2+P3 x RP3 x 
R^'3 X R^'3 X R R as 

Vc2(^2, C3,i, ■ ■ ■ , (3,^3,^^, 1^, A3, P, 1^3, V3, 73) 

= W2,liT,2,U,fl,U3,/3) +P2,2(S2,/i, 2/3,^73) " ^2,1(^2,(3,!, • • ■ , C3,Ks, fi, A3, 1^3, Vs, VsAn) 
+ ^^2,2(^2, /X, i^3, 73) + P2,o(S2, fJ-, A3, 1^3, 73) 

+ P2,l(S2, Cs,!, • • • , C3,K3,I^, A3, V3, V3, 73, An), (19) 



30 



where T2 — ^2,1 ® ^2,2 ® ^1,0 ® •>'^,i ® -^1,2, and © denotes element-wise addition. Then 

C\^{P2 - PI) =Vc, (S2, C3,l> • • • > C3,i^3, Gn, P„, I3, (/32*', \/^(^3 - /^s), x/^/^a, x/^/^s), 

Cly/n02 - ^2*n) =^^02(^2, C3,l, ■ ■ ■ , 4^3' 

The upper bound ^2(02) under P is 

W2(C2) = sup V,, (S2, Cs,!, • • • , C3,X3, <G„, Pn, ^^g, (/32*\ PV)\ ' 1^1), V^l^hlz) ■ 

73eM''3 \ / 
The upper bound ^2(02) under P„ is 

W2(C2) = sup V,, (E2, 6,1, ■ ■ ■ , C3,i^3, V^(Pn - ^n), Pn, An, (^^Jn, /^sjn)', V^(/^3 - ^3*n), V^/^S.n, 73) • 
736MP3 ^ / 

And the bootstrap analog of the upper bound is 

Uf\c,) = sup V,, V^(Pi'^ -Pn),Pi^4'^(^2^^^^^ ■ 

Results for t = 2 in Theorems 2.1 and 2.2 rely on the desired continuity of W2,i and 
^2,2 and negligibility of the error terms p2,o, P2,i and p2,2, which can be proved using similar 
arguments as those in Section 1.2 and are omitted. Below we show the negligibility of p2,o 
as an example. 

Theorem 2.4. Assume (Al)-(A3). Then 

1- sup^ggKPa |p2,o(S2,Pn,^3,^A^03 - \ -^p 0/ and 

2. sup^3g^.3 |p2,o(S?\P^'\4''\v^(/?f -/?3),73)| ^Pm a.s. P. 
If, in addition (A4) holds, then 
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Proof. For any probability measure e i°°(^2,o) and positive definite E2, we have 



P2,o(S2, A, i^3, 73) I = c^S2 ( max (i^a^ ii^3,i + ^I,i73,i) - max Hl ^-^^ 



A (^3,li^3,i + ^3,l73,i) + ^^3,l73,i) l#^5(//3,l)>lU3(//3,l)M3(^^3,l) 



4^2 ^^2! f max |i/3\ii/3,i| + max l^^a^i'^s,*! ) ^Az{Hz,^)^ai{Hz,,) 



ieAliHz^i) 

< -^11^311/^ [II-B2II ||-f^3,l||l^3(if3,l)^^*(i?3.l)] 



for a sufficiently large constant K, where the first inequality follows from the fact that 
I maxj(aj+6j)— maxj bi\ < maxj |aj| for any vectors a and b of the same dimension, and the sec- 
ond inequality follows from Cauchy-Schwarz inequality and the fact that maxj=i^...^K^3 \ \^3,i\ \ — 

To prove result 1, note that E2 is positive definite for sufficiently large n. Thus 



\p2,o{^2,K,A3,Mk- < K\\V^0S- P;)\\F^ ||i?2||||i/3,l||l^(^^3.i)^^J(//3.i 



Below we show that P„[| 1^2 1 1 ll^3,i||1^3(^^3^,)^^«(^^3^,)] = op(l). Define A3 (/i3,i) = HMh3,i)AAl{h3,i)}. 
Then ^A3{H3i)ytAi{H3i) — lA3(if3,i)>o- For any S > 0, choose e sufficiently small so that 
P||i?2|| ||i?3,i||l_f/3 i^B3^ < rj/2, where ^3^^ is as defined in Lemma 2.7. Then 



Pn 1 1^2 1 1 Il-f^3,l||l^(if3,i)^^*(if3_i) < Pn||-B2|| | |-f^3,l| |lA3(i?3,i)>0 

< Pn||-B2|| ||-f^3,l|| sup 1a3(/j3,i)>0 + Pn||-B2| I I |i^3,l| 1 1/13,1^53,6 < ^ 

/J3,ie-B3,£ 

with probability tending to one by appeal to Lemma 2.7, the LLN, and Slutsky's theorem. 
This together with the fact that \\^/n{Ps — ^3)]] — Op{l) completes the proof of result 1. 
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Similar arguments can be used to prove results 2 and 3, and are omitted. 



□ 



Next we will provide proof sketches for Theorems 2.1 and 2.2 for t = 1. 
Stage 1. Let c(j) be the j-th column of /p2xp2- Note that by definition 

V2,„(73) = (vcij){f^2, Ui, Uks, p„, A3, p;'y, v^ik - p;), v^p;,^3)Y'' 



1^ ^;c(p.)(S2,4l,■■■,4/^3,'Gn,Pn,A,(^2*^OT^V^(/^3-/?31,V^/?3^ ) 



The analog of V2,n under local alternatives P„ is 

, ■ / ^ ^ ^ ^ ^ \ P2 

VC;(73) = ivcij){^2, C3,l, • ■ • , C3,K„V^{rn - Pn), Pn, ^3,n, W*2!n, ^^^S^nY , V^Ws ' ^^In), ^Pl,n. 73)) .^^ 

(20) 

And the bootstrap analog of V2,n is 

V^?.(73) = (^cO)(sf ,2't..,gl,Vn(Pi^^ -Pn),Pi^4^(/3^^ ■ 

\ / 7=1 



Let denote the set of bounded p2 vector- valued functions on R*'^ equipped 

with the sup norm (over {l,...,p2} x M^^)- For any Ci G M^^, we define the following 
functions. 



1. wi^i : Dp^ X /°°(^i,i) X ^ M is defined as 



wi,i(Ei,a;,/3) = u 



clEr'Pi + max Hl^^^,, - Bl^^ 



(21) 



where J'l^i = {/(&i, /i2,i) = a'^h {vi + t^Q'^=i,...,K2 h\iP2,i - blPi) : a e M^'i,/3 
(/3[,/3l)T e M^'l+f^max{||a||, < X}. 
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2. ^1,2 : Dp^ X /°°(J"i,2) X UpJ'^iWP^) X ^ M is defined as 



Wl,2(Si,/X,0,73) = /X 



c{Ei El max ^1,10^(73) l#^|(i/2,i)=i 



(22) 



where J"i,2 = {/(&i,/i2,i) = a^fci maxjg^.(;j2,i) ^2,1^^^(73) l#^^(/i2.i)=i : « e RP\(j) 

(01, • • • e np,/~(M?'3),^3 e rp3j|o|| < k}. 

3. ^1,3 : i^pi X l'^{Ti,3) X IlpJ'^{RP^) X x Rf ^ M is defined as 



Wl,3(Ei, 72, 73) = )U 



X 



(23) 



where J"i,3 = {/(61, /i2,i) = a^&i (maxj6^*(h2 1) (/i2,i0(73) + ^2,i72,j) - T^^eAUh2.i) ^2,i72,i) 
l#^*(^2,i)>i : a e R^'S^ = (0{, . . . e np^r(MP3),^2 e R^'^73 e M^M|a|| < K}. 

4. pi,o : i:>pi X l^iTifi) X np2/°°(R^^) X riK^iRP^/^^) X RP3 ^ R is defined as 



pi,o(Ei,/x,0, A,73) = 



c{Ei^5i ( mjK i/I,i0i(73) - . mj«f i/I,i0i(73) ) l#^-(//2,i)=i 
yieyi2(-n2,ij »e>i5(ii2,i) / 

(24) 

where J^i,o = {/(61, /i2,i) = a'^^i (maXig^2(/,2^) /i^,i</)i(73) - maxie^*(ft2^) /i^,i0»(73)) l#yi-(h2,i)=i 

a e R^s ||a|| < = (01, . . . ,0;^JT e np2/°°(R^^), A e i^x2(M^'/'^'),73 e R^'^}. 

5. : X /~(^i,i) X UpJ°^{W3) X f]i^2(^^'^^') X X RPa ^ R is defined as 



pi,i(Ei,//, 0,^,72, 73) = // 



clEi^Ei( max fi^l,i0i(73)+^2,i72,O~. ^^3^ ,^2,i72. 



^ (^2,10i(73) + ^^2,l72,0 + max Hl^j2,z) l#AUH2,i)>l 



£^^2(^2,1) 

, (25) 



where J^i,! = {/(6i,/i2,i) = 0^61 ( maxj6^2(fe,i) (^2,i0»(73)+^2,i72,i) -niaxi6^2(/»2,i) ^2,i72,. 
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6. pi,2 : Dp^xD';^^j^^x...xD, 



X 



X X X X M ^ M is defined as 



X 



Pl,2(Sl,C2,l. • • ■ ,C2,K2,I^,<P, vA2,vA2,Z/2,?72,72,73, A) 



cl^i^BA max fi^2,i0i(73)+^^2,i72,O- 1/2^72, — . max i/2V</'i(73) ) 

\ie^2(-f^2,i) jeyl2(i?2,i) «6^2(-f^2,i) / 



^ . («Jl''2,i+ffll'72,i-">ax 7,i(HT ■.2,j+ffJi'72j))2^ l#^*(J/2,l)>l 

' mmi i ^ T — '■ ' <A 

^2 l«2,i»2,l 



(26) 



where J'i,2 = {/(&i,/i2,i) = a''bi(^iaaXi^A'2{h2,i) (^2,i0i(73)+^2,i72,i)-niaxjg^^(ft2,i) /i2,i72,: 



maxje^2{fe2.i) ^2,i</'i(73)) x (^1 



(''2 l''2,i + 'i2 l''2,i-maxjjj(feT j^i/2,j+'i2 l''2,j))^ ^, 
miiij ^ '- f '- ' <A 



- 1 



,i)>l) 



''2,l'^2,i'>2,l 

Rf3,max{||a||, II1/2II} < i^,C2,i e i^J^/if^'^ = 1' ■ ■ ■ '^2}. 
7. pi,3 : X /°°(^i,3) X np2/°°(M^'3) X Q^^^l^^'^^') x x ^ M is defined as 



pi,3(Si,/i, 0,^,772,^/3) = 



c{Si ^Bi max {Hl^Mva) + Hl ^ri2,r) 

\ l=l,...,K2 



max Hl^r]2,i- max Hl^(j)i{ri3))l#AUH2,i)=i 

t—l,...,K2 l&A2(H2,l) ' 



, (27) 



where J'1,3 = {/(6i,/i2,i) = 0^61 maxj=i,...,if2 (/i5_i0j(ry3)+/i5 i7/2,i) -maxj=i,...,if2 /i^ i?72,i 
-max,e^2(/.2.i) ^2,i<^^(^3)) l#^^(/.2,i)=i : a G R^M |a| | < X, = (01, . . . , e UpJ^{mP^),A2 e 



Define r : D^, x D^^^/^^ x . . . x Dj^/^^ x /-(J-i,i) x x np2^°°(K^^) x Qk2{^'"^^'''') x 
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Qk^C^'^^^^^) X X X X X x R^^ R as 

t(Si, C2,1> ■ ■ ■ > C2,K2,(^, 1^, 0, A, ^2, /3, ^2, m, 72, 73) 

= Wi,i(Ei, a;, ^) + Wi,2(Si, 0, 73) + Pi,o(Si, /x, 0, ^2, 73) + ^1,3(^1, /x, 0, ^2, ?72, Vs) 

- Pl,2(Si, (2,1, ■ ■ ■ , C2,K2,fJ', 0, A, ^2, i^2, V2, V2,V3, A„) 
+ Wi,3(Ei, II, 0, 72, 73) + 0, 72, 73) 

+ Pl,2(Si, (2,1, ■ ■ ■ , C2,K2,IJ', 0, A, ^2, i^2, ?72, 72, 73, An) 

where — T\^2 ® ^1,3 ® ^1,0 ® -^1,1 ® -^1,2 ® ^1,3, and ® denotes element-wise addition. 
Then 

C\^{h - = t(Ei, C2,l, ■ ■ ■ , C2,/^2, Gn, Pn, ^2,^, A, ^2, 

^{h - /32*), v^/?;, v^/32, v^/33*, v^/Ss)- 

cIV^(/3i - /3*,J = t(Ei, C2,1, . . . , C2,/f2, V^(Pn - Pn), Pn, ¥2^^, ^2, ^2,n, 

(/3l*,n,/32*n)^ V^(/32 - /32,n), \f^P'*2,n^ \f^^\n^ \f^^l,n^ \/^/^3,n), 

where V^^ is defined in (20). The upper bound U.\{c\) under P is 

Wi(ci) = sup r(Ei,C2,l,- • • ,C2,i^2,'^^n,Pn,V2,n, A, A, 

(^r, ^2*')^ ^(^2 - ^2*), V^/?2*, 72, V^/?;, 73) . 

The upper bound U.\{c\) under Pn is 

Wi(ci) = sup t(Ei, C2,1, ■ ■ ■ , C2,i^2, V^(Pn " n.), Pn, Vj^, A, An, 

72e]RP2,73eMJ'3 ^ 

(^1*,;, V^(^2 - ^2%), V^^2*n, 72, V^^3*„, 73) . 
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And the bootstrap analog of the upper bound is 

U?{c.) = sup r(Ef\c£\---,gl, V^(Fi^) -F„),Pi^vJl,^ 
72 eiR^2 ,73 SIRE'S ^ 

CPI^PI)\ - ^2), 72, V^k, 73) • 

Results for t = 1 in Theorems 2.1 and 2.2 rely on the limiting behavior of desired 
quantities (e.g. joint convergence of (G„, ¥2,™, \/n{(32 — 132)) to (Goo, V2,oo, ^2,00), where the 
formula of ¥2,00 is given in (28)), desired continuity of Wi^.'s, negligibility of the error terms 
Pi^.'s, and the fact that P(V2,oo £ ^p2Ch(^^^)) — 1, where ¥2,00 is given in (28) and we use 
Hp^ChiW^) to denote the set of continuous bounded p2 vector- valued functions on R*'^. 

The limiting process of G„ and limiting distribution of \/n{fi2 ~ P2)) have been given 
in previous sections. The limiting process of V2,„ is given in Lemma 2.5 below. Joint 
convergence of these three quantities can be obtained using similar arguments in the proof 
of Lemma 2.5. The continuity of Wi^.'s and negligibility of the error terms can be proved 
using similar arguments as before. To estabhsh P(V2,oo £ ^pi^bi^^^)) — 1, we only need to 
show that the sample paths for each component ¥2,00 are continuous w.p.l. Note that the 
j-th component of ¥2,00(73) can be written as 

«;2,l(S2,oo, Goo, P, ¥3,00, (^2*', PD') + ^2,2(^2,00, P, ¥3,00, 73) 

with C2 = the j-th column of Ip^xp-z- The sample path continuity follows by noticing that 
^2,2 is continuous at points (E2,oo, -P, 1^^^, 73) uniformly in 73 e W^. □ 

Below we give two lemmas that describe the limiting behavior of the desired quantities. 
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Lemma 2.5. Assume (A1)-(A4). Then 



V2,n V2,oo, Vj;, -*p„ ¥2,00, and Y^^^ -^p^ ¥2,00 m P-prohahility 



in Xlp^l'=°{W^), where 



ieyl3(ii3,lj 



+ ^2i^52( . ^i^3'i(V3,oo,^ + 73,i) - . max i/3Ti73Jl#^*(^^ )>i. (2^ 

\ie^|(i/3,i) ie^^(if3,i) y 



Proof. To prove the first result, we only need to show that 



sup 

/eBLi(nj,2io°(Mf3)) 



E7(¥2,„)-E/(¥2,oo) 



^ 0. 



For convenience, we use ¥'^' to denote the j-th component of ¥ for any ¥ e IipJ,^{W^). 
Then 



sup 

/eBLi(np2i°°(Kf3)) 



E7(¥2,„)-E/(¥2 
< E* 



< sup 

/eSLi(np2«°' 



E* 



/(¥2,„)-/(¥2,< 



sup 



b1 



P2 

< V E* sup 



V^:i(73)-V^^,^^(73) 



b1 



where the second inequality follows from the fact that / e BLiilip.^l'^iW'''''')). Define 
/,(¥W) ^ sup^3gij.3 |vb'^(73)-¥j,L(73)| for any ¥^1 e /~(Mf3), = l,...,^^. Then 
e SLi(Z-(Rf3)) and /,(¥2^;L) = 0. Since ¥2^;^ Yf^^, we have 



E* sup 
736MP3 



V£U73) - V^^' (73) = E7.(¥^^) - m^^iloo) ^ 



r\j] 
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for j' = 1, . . . ,p2- This completes the proof of V2,n ^2,00- Similar arguments can be used 
to prove results 2 and 3, and are omitted. □ 

Lemma 2.6. Consistency of $f Assume (Al)-(A2), then for each t it follows that y/n{Pt — 
= Op{l) and ^/n0^^■' — (5t) = Op^{l) in P -probability. If, in addition (A4) holds, then 

V^(A-A%) = Op„(i). 

Proof. The proof proceeds by backwards induction. The base case follows immediately since 
Vn0T ~ f^r) usual least squares estimator and hence is asymptotically normal and 

thus Op{l). Suppose, as the inductive step, that ^/n{Pt+l — Pt+i) — ^p(I)) result follows 
if we can establish that s/n^t — PI) = Op{l). Note that \/n{^t — Pt) can be decomposed 
as follows 

V^0t - PI) = W;,„ + ±;'¥r,BjVt+i,n. (29) 
The proof that WJ^^ is Op(l) is immediate and omitted. Consider the second term. 



max - max Hj^^ ^^l^^ 

l<i<Kt+i l<i<Kt+i 



< P„||5,|| max |i/,Vi,iV^(A+M-AVM)| 

l<.^<.K^■J-^ 



l<i<Kt+i 

< Fn\\Bt\\ \\Ht+iA\\ max 
i<i<^s:t+i 



n0t+i,i - Pt+i,i) I 



= Op{l), 



where the last equality follows from the LLN and the induction hypothesis, the series of 
inequalities follow from repeated use of the Cauchy-Schwartz inequality and the fact that 
|max2/(^) — max2;gf(^)| < max^ \f{z) — g{z)\. This proves the first part of the result. 
The second and third parts of the result follows from an identical argument. In particular, 
V^0T^ — (St) converges to the same limiting distribution as ^/u^Pt — Pt) probability by 
the bootstrap central limit theorem (see for example Bickel and Freedman 1981) and hence 
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satisfies the condition stated in the theorem. The same induction argument succeeds with 
only minor changes in notation. □ 



Lemma 2.7. Assume (Al)-(A3). Foranyt = l,...,T, defineAt{ht,i) = 4^{At{ht,i)AA*t{ht,i)}. 
Let e > be arbitrary. There exists subset Bt^^ of M^*/^* satisfying P{Ht i e Bi^^) > 1 — e, 
and supf^^^^Bt,, ^t{ht,i) = op(l). 

Proof. For any fixed arbitrary e > 0, we can choose a sufiiciently small 5 > so that 

P ( < if <5 \ < e/Kt 



for i = 1, . . . , Kt^ where we have defined 0/0 — for convenience. Define ^ = n^=*i Bt,e,h 
where 

Bt,i ^ 1 : = or l^liA^i - max^p -^mAjI ^ ^ 

The union bound ensures that P{Ht i e g) > 1 — e. 

To establish the result, we first show that sup^^^ lefitc "^(-^tiht^i) \ Al{ht^i)) = op(l). Note 
that sup^^ jgB^ ^ \ be decomposed as 



sup \ A*(/^t,l))lmin,=,....,, t,,(.,0>A„ 

+ sup #(A(/i*,i)\A*(^M))Wi x,t.,(..,)<A„ (30) 
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For any ht,i G Bt^^, let j*(/it,i) denote an element in e At{ht,i). Then 



sup 



ht.i&Bt,e,ie{l,...,Kt}\At{ht,i) I \^t,l 1 1 

= sup — j-j — 

<{2\\Yt,n\\-^/^S)<0 

in probability as n — > oo, where the inequality follows from the definition of St^g. This 
together with the definition of At{ht,i) for iami=i^,,,^KtTt,i{ht,i) > Xn imphes that the first 
term of (30) is op(l). 

In addition, it is easy to see that for any ht,i G -Bj^g, i e {1, . . . , Kt} \ A^{ht,i), 



> 



where i>t^j denotes the largest eigenvalue of (t,i- Notice that the numerator on the right hand 
side of the above display is further bounded below by 

^maxhl,i/3l^ - - max |/i,VVt,„,,| - \hl,Yt,n,\ > (V^S - 2||V,,,||)||/i,,i||. 

Thus 



/MM > y^'^-2iiv.,nii ^ ^ 

ht,ieBt,e,ie{l,-,Kt}\A^{ht,i) y A„ -y/ A„ UlSiKj=i^,„^Kt ^t,3 

in probability as n ^ oo. This together with the definition of At{ht,i) for mini=i_...^ii-j Tt^ht^i) < 
Xn imphes that the second term of (30) is op(l). 

Next we show that sup^^ ^^^^ ^ #(^^(/it^i) \ At{ht^i)) — op{l). Again, we decompose 
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sup A(/it,i))l#A*(/»t.i)=i+ sup #(A*(^t,i)\ A(/it,i))l#A*(/^t,i)>i- (31) 

For any /ij^i e S^,, n = 1}, we have 

mill 1 / — — — > mill 



i=i,...,Kt y A„ i=i,...,Kt ^JKh,i\\ht,l\\ 

Using similar arguments as above, it is easy to show that the above display is further bounded 
below by {^/nrj — 2 1 1 Vt,„ \\)/ ■\/^n '^^'^j=\,...,Kt h,j , which does not depend on ht,i. Thus 



mm \ — — — > = > 1 

ht,ieBt,en{#At{ht,i)=l},i=l,...,Kt y A„ y/Xn^^i=l,...,Kt^t,i 

in probability as n — >■ oo. Furthermore, let j*{ht,i) = Al{ht^i). Then 
mm ■ ■ — ■ ■ 

ht,i(^Bt,er\{#Atiht,i)=l} \\ht,i 1 1 



> min 



ht,i(^Bt,,n{#AUht,i)=i} \\ht,i 



> V^d - 2\\Yt,n\\ > 



in probability as n — >■ oo. This implies that the first term of (31) is op(l). 

For any hf^i such that ij^A^iht^i) > 1, let j*{ht^i) denote an arbitrary element in A^iht^i). 
It is easy to verify that 



An, 





mt,n\ 


1 + 




\/K^t,j*(ht,i)\\ht,i\ 
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where at,i is the smallest eigenvalue of (t,i- Next note that when :^A^(ht,i) > 1, 



< max l|Vt,„,j|| < ||V4,„||. 



To see this, let i{ht,i) = argmaxi^j*(fe, /ij^ {Vt,n,i + v^(A*i - and j'(/it,i) be an 

element in Aliht^i) \ and notice that 

where the second inequality makes use of the fact that hl^^{f3^ ^ — f3tj*(ht i)) — ^ with equality 
holding when i e Al{ht,i). This result, the preceding discussion, and some algebra show 
that Tt^i{ht,i) / Xn is bounded above by 



, ZMM < 2||v,,„| 



ht,ie{ht,i:*A*t{ht,i)>i},ieAt{ht,i) V ^ri -^A„minj=i^...^xt ^t/ 

which is op{l). Thus the second term of (31) is op(l). This completes the proof. □ 



3 Bias reduction for non-regular problems 

In this section we briefly discuss the issue of bias reduction for non-regular problems. It 
is now well known that unbiased estimators do not exist for non-smooth functionals (see 
Robins 2004, appendix I; and Porter and Hirano 2009). Furthermore, it has been shown that 
attempting to reduce the bias at a non-regular point in the parameter space can dramatically 
inflate the variance and subsequently the MSE elsewhere in the paramter space (Doss and 
Sethuraman 1989; Brown and Liu 1993; Chen 2004). Here, we attempt to illustrate this 
phenomenon in a toy example that is relevant for medical decision making. 

Suppose that Xi and X2 are independent normal random variables with means /ii and 
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//2 respectively, both are assumed to have unit variance. We consider the task of estimating 
9 = max(/xi,/X2) based on a single observation Xi — xi and X2 = X2. Notice that this 
problem corresponds to a toy decision making problem where /li denotes the mean response 
for patients following treatment i. The MLE is given by ^mie — max(Xi, X2). It is clear that 
the MLE suffers from upward bias since 

9 = max(//i,/X2) = max(EXi,EX2) < Emax(Xi,X2). 

It will be convenient to write the ^mie as 

^mle = (^1 + X2)/2 + \X, - X2I/2. 



The first term on the right hand side of the above display is the UMVU estimator of 9 when 
there is no treatment effect (e.g. Hi = ^2)- The second term can be seen as an estimator 
of the advantage of recommending treatment via the decision rule arg maxj=i 2 X^ compared 
with randomly assigning treatment according to an even odds coin flip. The thresholding 
estimators of Chakraborty et al. (2009) and Moodie and Richardson (2007) shrink the term 
\Xi — X2I/2 towards zero in an attempt to alleviate some of the bias inherent to ^mie. In 
particular, an analogue of the soft-thresholding estimator of Chakraborty et al. (2009) for 
this problem is given by 



4oft = (^l+^2)/2 + 



1- ^ 



\Xi — X2 



\X^-X2\/2 



where A denotes a tuning parameter. An analogue of the hard-thesholding estimator of 
Moodie and Richardson (2007) is given by 

^hard — (-'^l + X2)/2 + 1|Xi-X2|>a|-'^1 " -^^21/2, 
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Bias of soft thresholding estimator 



MSE of soft threshoiding estimator 




H1-H2 H1-H2 



Figure 1: Left: The bias of ^soft as a function of effect size /ii — /i2 and tuning parameter 
A. Reducing the bias at /zi — /i2 = requires increasing A which is seen to dramatically 
inflate bias elsewhere. Right: The MSE of 6'soft as a function of effect size /ii — fi2 and tuning 
parameter A. Attempting to reduce the bias at /ii — /i2 = results in a modest reduction in 
MSE at /ii — /i2 = but inflates the MSE significantly elsewhere. 

again where A is a tuning parameter. Notice that both estimators reduce to 6'niie when A = 0. 
As we will see, the bias ^mie is largest when fii = /i2. Both ^soft and ^hard seek to alleviate 
some of this bias by shrinking 6'niie towards {Xi + X2)/2 whenever |Xi — X2I is small. 

Figure (3) shows the bias and MSE of the soft-threshold estimator ^soft as a function of 
effect size /zi — fi2 and tuning parameter A. The figure shows that by increasing A the bias 
at /ii — /i2 = decreases, however, modest increases in A lead to dramatic increases in bias 
non-zero values of fii — ^2 and subsequently inflate the MSE. Figure (3) shows results of a 
similar nature for the hard-thresholding estimator ^hard- These figures show that the price of 
bias reduction at yUi — /i2 = can be quite severe unless one has very strong prior knowledge 
about the true value of /ii — /i2- 
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Bias of hard thresholding estimator 



lUISE of hard thresholding estimator 





Figure 2: Left: The bias of ^hard as a function of effect size /ii — fi2 and tuning parameter A. 
Reducing the bias at /xi — = requires increasing A which is seen to dramatically inflate 
bias elsewhere. Right: The MSE of 6'hard as a function of effect size fii — ^2 and tuning 
parameter A. Attempting to reduce the bias at /ii — /i2 = results in a modest reduction in 
MSE at /ii — /i2 = but inflates the MSE significantly elsewhere. 
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4 Additional empirical results 

Here, we present additional empirical results for the ACI and competitors. We give results 
for the generative models in the main body of the paper with varying dataset sizes, for 
generative models with three treatments at the second stage, and for generative models with 
three stages of binary treatments. All of the results in this section are based on 1000 Monte 
Carlo repetitions, and for the ACI we use the tuning parameter A„ = log log n. 

4.1 Varying dataset size 

First, we present a suite of experiments with the two-stage, two-action models presented in 
the main body of the paper, with varying data set size N. Tables 1 through 12 show our 
results. 
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0.964 


0.954 


0.955 


0.950 


0.957 


0.948 


0.956 


0.957 


N = 300 


xL/X. i 

NR 


tiiX. z 

NNR 


H/X. o 

NR 


IliX. 4 

NNR 


xLiX. 

NR 


rjX. D 
R 


xL/X. A 

R 


rL/X. Jd 

NR 


niX. ^ 

NNR 


CPB 


0.899* 


0.915* 


0.947 


0.949 


0.939 


0.967 


0.961 


0.946 


0.949 


PPE 


0.949 


0.946 


0.952 


0.948 


0.941 


0.948 


0.958 


0.949 


0.949 


oi 


0.9oz 


0.945 


U.935^ 


0.9z9^ 


0.935^ 


0.644^ 


0.780^ 


0.869^ 


0.851^ 


ACI 


0.970 


0.976 


0.969 


0.970 


0.956 


0.973 


0.965 


0.972 


0.975 


TV = 500 


rL/X. i 
NR 


tiiX. z 

NNR 


IIjX. O 

NR 


iliX. 4 

NNR 


rjX. 

NR 


R 


rL/X. A 

R 


rljX. Jj 
NR 


rjX. (_j 

NNR 


CPB 


0.892* 


0.906* 


0.935* 


0.933* 


0.929* 


0.942 


0.943 


0.931* 


0.934* 


PPE 


0.936 


0.938 


0.941 


0.937 


0.929* 


0.934* 


0.938 


0.943 


0.937 


ST 


0.956 


0.949 


0.923* 


0.917* 


0.910* 


0.664* 


0.790* 


0.895* 


0.875* 


ACI 


0.965 


0.976 


0.964 


0.968 


0.952 


0.950 


0.944 


0.966 


0.967 


N = 1000 


Ex. 1 
NR 


Ex. 2 
NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 
NR 


Ex. C 
NNR 


CPB 


0.907* 


0.933* 


0.933* 


0.943 


0.944 


0.945 


0.951 


0.936 


0.940 


PPE 


0.949 


0.938 


0.949 


0.947 


0.952 


0.942 


0.949 


0.942 


0.944 


ST 


0.953 


0.933* 


0.944 


0.934* 


0.934* 


0.813* 


0.880* 


0.921* 


0.892* 


ACI 


0.968 


0.980 


0.968 


0.971 


0.961 


0.946 


0.951 


0.968 


0.971 



Table 1: Monte Carlo estimates of coverage probability of confidence intervals for 
PIqi (intercept term) at the 95% nominal level. Generative models have two stages and 
two actions per stage. Estimates are constructed using 1000 datasets of size 150, 300, 500, 
and 1000 are drawn from each model, and 1000 bootstraps drawn from each dataset. Esti- 
mates significantly below 0.95 at the 0.05 level are marked with *. Models are designated 
NR = non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


xLiX. 1 

NR 


tiiX. z 

NNR 


H/X. o 

NR 


HiX. 4 

NNR 


XLiX. 

NR 


H/X. D 

R 


IliX. A 

R 


rL/X. o 

NR 


rL/X. (j 

NNR 


CPB 


0.404* 


0.404* 


0.430* 


0.429* 


0.457 


0.449* 


0.450 


0.428* 


0.428* 


PPE 


0.376* 


0.376* 


0.418* 


0.418* 


0.451* 


0.448* 


0.453* 


0.410* 


0.410* 


bl 


0.344^ 


0.344^ 


U.4z7 


0.4z7 


0.4dd 


0.469 


U.474^ 


0.430^ 


0.4z8 


ACI 


0.518 


0.518 


0.487 


0.487 


0.486 


0.494 


0.476 


0.497 


0.498 


N = 300 


xL/X. 1 

NR 


tiiX. z 

NNR 


H/X. o 

NR 


IliX. 4 

NNR 


XLiX. 

NR 


rjX. D 
R 


XL/X. A 

R 


rL/X. Jd 

NR 


rL/X. <^ 

NNR 


CPB 


0.284* 


0.284* 


0.300 


0.300 


0.320 


0.314 


0.314 


0.299 


0.299 


PPE 


0.264 


0.264 


0.292 


0.292 


0.316 


0.316 


0.317 


0.292 


0.292 


oi 


0.z40 


f\ n Art 

0.z40 




o.z8y^ 


0.319^ 


0.3z6^ 


n on A^ 
U.3z4^ 


0.307^ 


0.307^ 


ACI 


0.367 


0.367 


0.343 


0.343 


0.341 


0.338 


0.328 


0.343 


0.344 


TV = 500 


rL/X. i 
NR 


rL/X. Z 
NNR 


IIjX. O 

NR 


iliX. 4 

NNR 


iIjX. 

NR 


R 


rL/X. A 

R 


rljX. D 
NR 


rjX. (_j 

NNR 


CPB 


0.218* 


0.218* 


0.232* 


0.232* 


0.248* 


0.243 


0.243 


0.232* 


0.232* 


PPE 


0.203 


0.203 


0.226 


0.226 


0.245* 


0.247* 


0.245 


0.226 


0.226 


ST 


0.184 


0.185 


0.221* 


0.222* 


0.245* 


0.253* 


0.251* 


0.232* 


0.232* 


ACI 


0.284 


0.284 


0.265 


0.265 


0.265 


0.255 


0.249 


0.265 


0.265 


N = 1000 


Ex. 1 
NR 


Ex. 2 
NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 
NR 


Ex. C 
NNR 


CPB 


0.155* 


0.155* 


0.164* 


0.164 


0.175 


0.171 


0.171 


0.164 


0.164 


PPE 


0.144 


0.144 


0.159 


0.160 


0.173 


0.173 


0.172 


0.159 


0.159 


ST 


0.131 


0.131* 


0.156 


0.156* 


0.172* 


0.179* 


0.176* 


0.159* 


0.159* 


ACI 


0.202 


0.202 


0.188 


0.188 


0.187 


0.174 


0.172 


0.188 


0.188 



Table 2: Monte Carlo estimates of the mean width of confidence intervals for 
PIqi (intercept term) at the 95% nominal level. Generative models have two stages and 
two actions per stage. Estimates are constructed using 1000 datasets of size 150, 300, 500, 
and 1000 are drawn from each model, and 1000 bootstraps drawn from each dataset. Esti- 
mates significantly below 0.95 at the 0.05 level are marked with *. Models are designated 
NR = non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


xLiX. 1 

NR 


tliX. Z 

NNR 


H/X. o 

NR 


HiX. 4 

NNR 


xLiX. 

NR 


rjX. D 

R 


xL/X. A 

R 


rL/X. o 
NR 


iLiX. ^ 

NNR 


CPB 


0.942 


0.944 


0.948 


0.948 


0.928* 


0.942 


0.939 


0.944 


0.944 


PPE 


0.946 


0.946 


0.945 


0.945 


0.931* 


0.936 


0.939 


0.946 


0.947 


bl 


0.940 


0.946 


0.950 


0.950 


0.941 


0.941 


0.941 


0.945 


0.946 


ACI 


0.964 


0.966 


0.958 


0.957 


0.941 


0.947 


0.940 


0.954 


0.954 


N = 300 


xL/X. i 

NR 


tiiX. z 

NNR 


rjX. o 

NR 


tiX. 4 

NNR 


xLiX. 

NR 


rjX. D 

R 


xL/X. A 

R 


rL/X. o 

NR 


tliX. ^ 

NNR 


CPB 


0.942 


0.947 


0.952 


0.950 


0.948 


0.946 


0.958 


0.945 


0.945 


PPE 


0.944 


0.946 


0.953 


0.953 


0.943 


0.942 


0.956 


0.945 


0.945 


oi 


0.945 


0.945 


0.948 


0.949 


0.951 


0.940 


0.955 


r\ f\A A 
0.944 


rt f\A A 
0.944 


ACI 


0.960 


0.959 


0.957 


0.957 


0.955 


0.946 


0.958 


0.951 


0.951 


TV = 500 


rL/X. i 
NR 


tiiX. z 

NNR 


±1jX. o 

NR 


FjX. 4 

NNR 


EjX. 

NR 


R 


rL/X. A 

R 


rljX. D 
NR 


rjX. U 

NNR 


CPB 


0.948 


0.951 


0.954 


0.953 


0.948 


0.952 


0.953 


0.953 


0.953 


PPE 


0.948 


0.950 


0.955 


0.953 


0.951 


0.951 


0.952 


0.949 


0.948 


ST 


0.948 


0.948 


0.954 


0.953 


0.951 


0.952 


0.949 


0.952 


0.951 


ACI 


0.967 


0.966 


0.964 


0.964 


0.961 


0.952 


0.953 


0.959 


0.959 


N = 1000 


Ex. 1 
NR 


Ex. 2 
NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 
NR 


Ex. C 
NNR 


CPB 


0.941 


0.945 


0.938 


0.944 


0.937 


0.941 


0.941 


0.943 


0.942 


PPE 


0.942 


0.944 


0.939 


0.942 


0.936 


0.940 


0.941 


0.943 


0.943 


ST 


0.945 


0.947 


0.944 


0.943 


0.941 


0.939 


0.945 


0.942 


0.943 


ACI 


0.963 


0.961 


0.955 


0.955 


0.945 


0.941 


0.941 


0.947 


0.947 



Table 3: Monte Carlo estimates of coverage probability of confidence intervals for 
Pi 2 (main effect of history) at the 95% nominal level. Generative models have two stages 
and two actions per stage. Estimates are constructed using 1000 datasets of size 150, 300, 
500, and 1000 are drawn from each model, and 1000 bootstraps drawn from each dataset. 
Estimates significantly below 0.95 at the 0.05 level are marked with *. Models are designated 
NR = non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


xL/X. 1 

NR 


tiiX. z 

NNR 


H/X. o 

NR 


tiX. 4 

NNR 


XLiX. 

NR 


rjX. D 

R 


xLiX. A 

R 


rL/X. o 

NR 


rL/X. ^ 

NNR 


CPB 


0.331 


0.331 


0.333 


0.333 


0.379* 


0.354 


0.355 


0.329 


0.329 


PPE 


0.330 


0.330 


0.332 


0.332 


0.376* 


0.350 


0.353 


0.329 


0.329 


bl 


O.OZO 


0.3z8 


n ooo 
U.33z 


r\ OOO 

0.33z 


0.384 


O.ooO 


U.3ol 


n oon 
0.3z9 


rt oon 
0.3z9 


ACI 


0.360 


0.360 


0.347 


0.348 


0.392 


0.359 


0.358 


0.339 


0.339 


N = 300 


xL/X. i 

NR 


tiiX. z 

NNR 


rjX. o 

NR 


lliX. 4 

NNR 


XLiX. 

NR 


rjX. D 
R 


XL/X. A 

R 


rL/X. Jd 

NR 


rL/X. <^ 

NNR 


CPB 


0.231 


0.231 


0.232 


0.232 


0.265 


0.246 


0.246 


0.229 


0.229 


PPE 


0.230 


0.230 


0.231 


0.231 


0.263 


0.245 


0.246 


0.229 


0.229 


bi 


0.zz9 


U.zz9 


n 001 
U.z31 


n 001 
0.z31 


O.ZDD 


U.zoO 


U.z49 


n oon 
0.zz9 


n oon 
0.zz9 


ACI 


0.251 


0.251 


0.242 


0.242 


0.275 


0.248 


0.247 


0.233 


0.233 


N = 500 


rL/X. i 
NR 


rL/X. Z 
NNR 


FjX. O 

NR 


FjX. 4 

NNR 


EjX. 

NR 


R 


rL/X. A 

R 


rljX. D 

NR 


iLjX. U 

NNR 


CPB 


0.178 


0.178 


0.179 


0.179 


0.205 


0.190 


0.190 


0.177 


0.177 


PPE 


0.178 


0.178 


0.178 


0.178 


0.204 


0.190 


0.190 


0.177 


0.177 


ST 


0.177 


0.177 


0.178 


0.178 


0.205 


0.193 


0.192 


0.177 


0.177 


ACI 


0.194 


0.194 


0.187 


0.187 


0.213 


0.191 


0.191 


0.179 


0.179 


N = 1000 


Ex. 1 
NR 


Ex. 2 
NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 
NR 


Ex. C 
NNR 


CPB 


0.126 


0.126 


0.126 


0.126 


0.145 


0.134 


0.134 


0.124 


0.124 


PPE 


0.125 


0.125 


0.126 


0.126 


0.144 


0.134 


0.134 


0.124 


0.124 


ST 


0.124 


0.124 


0.125 


0.125 


0.144 


0.135 


0.135 


0.124 


0.124 


ACI 


0.137 


0.137 


0.132 


0.132 


0.150 


0.134 


0.134 


0.126 


0.126 



Table 4: Monte Carlo estimates of the mean width of confidence intervals for 
/3* Q 2 (main effect of history) at the 95% nominal level. Generative models have two stages 
and two actions per stage. Estimates are constructed using 1000 datasets of size 150, 300, 
500, and 1000 are drawn from each model, and 1000 bootstraps drawn from each dataset. 
Estimates significantly below 0.95 at the 0.05 level are marked with *. Models are designated 
NR = non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


xLiX. 1 

NR 


Il/X. z 

NNR 


H/X. o 

NR 


tiX. 4 

NNR 


xLiX. 

NR 


H/X. D 

R 


xLiX. A 

R 


rL/X. o 

NR 


iLiX. ^ 

NNR 


CPB 


0.934* 


0.935* 


0.930* 


0.933* 


0.938 


0.928* 


0.939 


0.925* 


0.928* 


PPE 


0.931* 


0.940 


0.938 


0.940 


0.946 


0.912* 


0.931* 


0.904* 


0.903* 


bl 


0.94o 


0.945 


U.938 


0.942 


0.95z 


0.943 


0.919 


0.759^ 


O.YDZ^ 


ACI 


0.992 


0.992 


0.968 


0.972 


0.957 


0.955 


0.950 


0.964 


0.965 


N = 300 


xL/X. i 

NR 


tiiX. z 

NNR 


rjX. o 
NR 


lliX. 4 

NNR 


xLiX. 

NR 


rjX. D 
R 


xL/X. A 

R 


rL/X. Jd 

NR 


rL/X. ^ 

NNR 


CPB 


0.952 


0.952 


0.948 


0.952 


0.943 


0.936 


0.941 


0.949 


0.951 


PPE 


0.951 


0.952 


0.960 


0.959 


0.956 


0.907* 


0.944 


0.952 


0.954 


oi 


0.951 


0.949 


U.938 


0.941 


0.949 


0.951 


0.9z0^ 


0.877^ 


0.883^ 


ACI 


0.994 


0.994 


0.975 


0.976 


0.962 


0.957 


0.950 


0.977 


0.976 


TV = 500 


rL/X. i 
NR 


tiiX. z 

NNR 


FjX. O 

NR 


FjX. 4 

NNR 


EjX. 

NR 


R 


rL/X. A 

R 


rljX. Jj 
NR 


NNR 


CPB 


0.947 


0.944 


0.947 


0.947 


0.943 


0.946 


0.944 


0.943 


0.946 


PPE 


0.952 


0.945 


0.950 


0.951 


0.940 


0.919* 


0.945 


0.945 


0.944 


ST 


0.965 


0.965 


0.953 


0.959 


0.951 


0.927* 


0.910* 


0.924* 


0.935* 


ACI 


0.992 


0.992 


0.976 


0.980 


0.956 


0.958 


0.947 


0.975 


0.978 


N = 1000 


Ex. 1 
NR 


Ex. 2 
NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 
NR 


Ex. C 
NNR 


CPB 


0.948 


0.949 


0.934* 


0.939 


0.950 


0.954 


0.951 


0.939 


0.947 


PPE 


0.948 


0.949 


0.948 


0.945 


0.952 


0.941 


0.948 


0.950 


0.950 


ST 


0.956 


0.955 


0.959 


0.955 


0.954 


0.935* 


0.924* 


0.947 


0.958 


ACI 


0.998 


0.995 


0.972 


0.973 


0.963 


0.954 


0.951 


0.972 


0.977 



Table 5: Monte Carlo estimates of coverage probability of confidence intervals for 
Pill (main effect of treatment) at the 95% nominal level. Generative models have two 
stages and two actions per stage. Estimates are constructed using 1000 datasets of size 
150, 300, 500, and 1000 are drawn from each model, and 1000 bootstraps drawn from each 
dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. Models are 
designated NR = non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


xL/X. 1 

NR 


Il/X. z 

NNR 


H/X. o 

NR 


tiX. 4 

NNR 


XLiX. 

NR 


rjX. D 

R 


tliX. A 

R 


rL/X. o 
NR 


rL/X. 

NNR 


CPB 


0.385* 


0.385* 


0.430* 


0.430* 


0.457 


0.436* 


0.451 


0.428* 


0.428* 


PPE 


0.365* 


0.366 


0.419 


0.419 


0.452 


0.418* 


0.452* 


0.404* 


0.403* 




0.339 


0.339 


0.4ZD 


0.4z7 


0.469 


0.436 


0.480 


0.4zo 


rt A^A^ 
0.4z4^ 


ACI 


0.502 


0.502 


0.488 


0.488 


0.487 


0.475 


0.477 


0.491 


0.491 


N = 300 


xL/X. i 

NR 


il/X. z 

NNR 


H/X. o 

NR 


IliX. 4 

NNR 


XLiX. 

NR 


rjX. D 
R 


XL/X. A 

R 


rL/X. Jd 

NR 


rL/X. ^ 

NNR 


CPB 


0.269 


0.269 


0.300 


0.300 


0.320 


0.309 


0.313 


0.299 


0.299 


PPE 


0.256 


0.256 


0.292 


0.292 


0.316 


0.297* 


0.317 


0.290 


0.290 


oi 


0.z37 


0.Z37 


0.z89 


0.z89 


0.3z0 


0.313 


0.3z7^ 


0.306^ 


0.307^ 


ACI 


0.354 


0.354 


0.342 


0.342 


0.341 


0.327 


0.327 


0.342 


0.342 


N = 500 


rL/X. i 
NR 


rL/X. z 
NNR 


±!jX. o 

NR 


iIjX. 4 

NNR 


rjX. 

NR 


R 


rL/X. A 

R 


rljX. Jj 
NR 


rjX. U 

NNR 


CPB 


0.208 


0.208 


0.232 


0.232 


0.248 


0.242 


0.244 


0.232 


0.232 


PPE 


0.197 


0.197 


0.226 


0.226 


0.245 


0.234* 


0.245 


0.226 


0.226 


ST 


0.182 


0.183 


0.222 


0.222 


0.246 


0.252* 


0.253* 


0.232* 


0.233* 


ACI 


0.275 


0.275 


0.265 


0.265 


0.265 


0.250 


0.250 


0.265 


0.265 


N = 1000 


Ex. 1 
NR 


Ex. 2 
NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 
NR 


Ex. C 
NNR 


CPB 


0.147 


0.147 


0.164* 


0.164 


0.175 


0.171 


0.171 


0.164 


0.164 


PPE 


0.139 


0.139 


0.160 


0.160 


0.173 


0.170 


0.172 


0.160 


0.160 


ST 


0.129 


0.129 


0.156 


0.156 


0.172 


0.184* 


0.177* 


0.159 


0.159 


ACI 


0.195 


0.195 


0.188 


0.188 


0.187 


0.173 


0.173 


0.188 


0.188 



Table 6: Monte Carlo estimates of the mean width of confidence intervals for 
Pill (main effect of treatment) at the 95% nominal level. Generative models have two 
stages and two actions per stage. Estimates are constructed using 1000 datasets of size 
150, 300, 500, and 1000 are drawn from each model, and 1000 bootstraps drawn from each 
dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. Models are 
designated NR = non-regular, NNR = near-non-regular, R = regular. 
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0.941 
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0.953 
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0.954 



Table 7: Monte Carlo estimates of coverage probability of confidence intervals for 
fiii2 (interaction between history and treatment) at the 95% nominal level. Generative 
models have two stages and two actions per stage. Estimates are constructed using 1000 
datasets of size 150, 300, 500, and 1000 are drawn from each model, and 1000 bootstraps 
drawn from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked 
with *. Models are designated NR = non-regular, NNR — near-non-regular, R = regular. 



54 



N = 150 


xL/X. 1 

NR 


tiiX. z 

NNR 


H/X. o 

NR 


tiX. 4 

NNR 


XLiX. 

NR 


rSK. D 

R 


xLiX. A 

R 


rL/X. o 

NR 


rL/X. kj 

NNR 


CPB 


0.331 


0.331 


0.333 


0.332 


0.363 


0.354 


0.355 


0.329 


0.329 


PPE 


0.330 


0.330 


0.332 


0.332 


0.361 


0.350 


0.353 


0.328 


0.328 


bl 


O.OZO 


rv OOO 


n ooo 
U.33z 


r\ OOO 

0.332 


n Odd 
0.366 


0.359 


U.3o0 


n oon 
0.3z9 


n oon 
0.3z9 


ACI 


0.360 


0.360 


0.347 


0.347 


0.378 


0.359 


0.358 


0.339 


0.339 


N = 300 


xL/X. i 

NR 


tiiX. z 

NNR 


rjX. o 

NR 


IliX. 4 

NNR 


XLiX. 

NR 


rjX. D 
R 


XL/X. A 

R 


rL/X. Jd 

NR 


rL/X. <^ 

NNR 


CPB 


0.231 


0.231 


0.231 


0.231 


0.254 


0.246 


0.246 


0.228 


0.228 


PPE 


0.230 


0.230 


0.231 


0.231 


0.252 


0.244 


0.246 


0.228 


0.228 


bi 


n ooo 


rv OOO 


n oon 
U.z3U 


n oon 
0.z3U 


n oc; /I 
O.zo4 


0.z50 


U.z49 


n oon 
0.zz9 


n oon 
0.zz9 


ACI 


0.251 


0.250 


0.241 


0.241 


0.264 


0.248 


0.247 


0.233 


0.233 


N = 500 


rL/X. i 
NR 


rL/X. Z 
NNR 


IIjX. O 

NR 


FjX. 4 

NNR 


iIjX. 

NR 


iIjX. D 

R 


rL/X. A 

R 


rljX. D 

NR 


iLjX. U 

NNR 


CPB 


0.178 
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0.124 
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0.124 
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Table 8: Monte Carlo estimates of the mean width of confidence intervals for 
fiii2 (interaction between history and treatment) at the 95% nominal level. Generative 
models have two stages and two actions per stage. Estimates are constructed using 1000 
datasets of size 150, 300, 500, and 1000 are drawn from each model, and 1000 bootstraps 
drawn from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked 
with *. Models are designated NR = non-regular, NNR — near-non-regular, R = regular. 
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0.953 
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0.950 


ST 


0.966 
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0.972 
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Table 9: Monte Carlo estimates of coverage probability of confidence intervals for 
the contrast ^ i + l^i i 2 (effect of action for history = 1) at the 95% nominal level. Genera- 
tive models have two stages and two actions per stage. Estimates are constructed using 1000 
datasets of size 150, 300, 500, and 1000 are drawn from each model, and 1000 bootstraps 
drawn from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked 
with *. Models are designated NR = non-regular, NNR — near-non-regular, R = regular. 
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Table 10: Monte Carlo estimates of the mean width of confidence intervals for 
the contrast 131 i + l^l i 2 (effect of action for history = 1) at the 95% nominal level. Genera- 
tive models have two stages and two actions per stage. Estimates are constructed using 1000 
datasets of size 150, 300, 500, and 1000 are drawn from each model, and 1000 bootstraps 
drawn from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked 
with *. Models are designated NR = non-regular, NNR — near-non-regular, R = regular. 
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Table 11: Monte Carlo estimates of coverage probability of confidence intervals for 
the contrast f^in — ,5^ i 2 (effect of action for history = -1) at the 95% nominal level. Gen- 
erative models have two stages and two actions per stage. Estimates are constructed using 
1000 datasets of size 150, 300, 500, and 1000 are drawn from each model, and 1000 bootstraps 
drawn from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked 
with *. Models are designated NR = non-regular, NNR — near-non-regular, R = regular. 
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0.221 


0.217 


0.218 


0.202 


0.202 


ST 


0.179 


0.179 


0.200 


0.200 


0.219 


0.229 


0.223* 


0.201 


0.201 


ACI 


0.241 


0.241 


0.230 


0.230 


0.242 


0.219 


0.219 


0.229 


0.229 



Table 12: Monte Carlo estimates of the mean width of confidence intervals for 
the contrast f^in — ,5^ i 2 (effect of action for history = -1) at the 95% nominal level. Gen- 
erative models have two stages and two actions per stage. Estimates are constructed using 
1000 datasets of size 150, 300, 500, and 1000 are drawn from each model, and 1000 bootstraps 
drawn from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked 
with *. Models are designated NR = non-regular, NNR — near-non-regular, R = regular. 



59 



4.2 Models with ternary actions 

Here, we present results using a suite of examples similar to those of Chakraborty et al. 
(2009), but that have three possible treatments at the second stage. These models are 
defined as follows: 

• X,e {-1, 1} for te{l, 2}, Al e {-l, l}, and A2 G {(0, -0.5)t, (-1, 0.5)t, (1, 0.5)^} 

. P(Ai ^1)^ P(Ai = -1) = 1/2, 

P{A2 = (0, -1)T) = P{A, = (-1, 0.5)T) = P{A2 = (1, 0.5)T) = 1/3 

• P(Xi = 1) = P(Xi = -1) = 1/2, P(X2 = l|Xi, Al) = expit(5iXi + S2A1) 

• y-i = 0, 

Y2 = ei+e2^i+e3^i+e4^i^i+(e5,e6)^2+^2(e7,e8)^2+^i(e9,6o)^+6, e ~ n{o, i) 

where expit(a;) = e^/ (1 + e^). This class is parameterized by twelve values ^1, ^2, do, ^1, ^2- 
The analysis model uses histories defined by: 

H2,o - {l,Xi,Ai,XiAi,X2y (32) 

H2,i = {l,X2,Aiy (33) 

//i,o = (l,Xi)T (34) 

Hi,i = (l,Xi)T. (35) 

Our working models are given by Q2(H2, A2; P2) = -^^2,0/^2,0 + -f^2,i/^2,i,i^2,i + -f^2,i/52,i,2^2,2 
and Qi{Hi, Al] I3i) = if^ 0/^1,0 + -f^i i/3i,iAi- In Table 4.2, for each of these models wc give 
the probability p of generating a history where each of the three possible treatments at the 
second stage have exactly the same effect. This is analogous to having the second stage 
action show no effect in a binary model. Furthermore, because of the Helmert encoding we 
have used in our analysis models, and because of the structure of ^, it happens that the 
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Example 




S 


Regularity 


1 


(0,0,0,0,0,0,0,0,0,0)^ 


(0.5,0.5)T 


J9 = 1,0 = 0/0 


2 


(0, 0,0, 0,0.01, 0.01, 0,0,0,0)^ 


(0.5,0.5)T 


p = 0, = oo 


3 


(0, 0, -0.5, 0, 0.5, 0.5, 0, 0, 0.5, 0.5)^ 


(0.5,0.5)T 


p= 1/2,0 = 1.0 


4 


(0, 0, -0.5, 0, 0.5, 0.5, 0, 0, 0.49, 0.49)^ 


(0.5,0.5)T 


p = 0,0 = 1.0204 


5 


(0, 0, -0.5, 0, 1.00, 1.00, 0.5, 0.5, 0.5, 0.5)^ 


(1.0, O.OjT 


p = 1/4, = 1.4142 


6 


(0, 0, -0.5, 0, 0.25, 0.25, 0.5, 0.5, 0.5, 0.5)^ 


(0.1,0.1)T 


p = 0,0 = 0.3451 


A 
B 
C 


(0, 0, -0.25, 0, 0.75, 0.75, 0.5, 0.5, 0.5, 0.5)^ 
(0, 0, 0, 0, 0.25, 0.25, 0, 0, 0.25, 0.25)^ 
(0, 0, 0, 0, 0.25, 0.25, 0, 0, 0.24, 0.24)^ 


(0.1,0.1)T 
(0,0)T 
(0,0)T 


p = O,0= 1.035 
p = 1/2,0= 1.00 
p = 1/2,0= 1.00 



Table 13: Parameters indexing the example models. 



standardized effect size of treatment 1 versus treatment 2, treatment 1 versus treatment 3, 
and treatment 2 versus treatment 3 are all exactly equal in our examples. We report this as 
in Table 4.2. Tables 14 through 25 detail our results. 
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N = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 




n Q/i /I * 

U.o4l:4 


U.oDo 


n Qi 1 * 
u.yii 


n Qi 7* 


u.y4u 


n 090* 


u.yo4 


n onn* 
u.yuu 


u.yuo 


PPE 


n Q4Q 


Q48 




929* 


939 


823* 


n 923* 


870* 


860* 


ACI 


0.971 


0.979 


0.966 


0.969 


0.963 


0.971 


0.956 


0.974 


0.977 


A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.835* 


0.880* 


0.931* 


0.935* 


0.931* 


0.935* 


0.941 


0.927* 


0.935* 


PPE 


0.944 


0.942 


0.948 


0.948 


0.940 


0.863* 


0.941 


0.929* 


0.928* 


ACI 


0.966 


0.981 


0.976 


0.978 


0.963 


0.976 


0.963 


0.979 


0.981 



Tabic 14: Monte Carlo estimates of coverage probability of confidence intervals for 
Pifi^i (intercept term) at the 95% nominal level. Generative models two stages and three 
actions at the second stage. Estimates are constructed using 1000 datasets of size 150, 300 
are drawn from each model, and 1000 bootstraps drawn from each dataset. Estimates sig- 
nificantly below 0.95 at the 0.05 level are marked with *. Models are designated NR — 
non-regular, NNR = near-non-regular, R = regular. 



N = 150 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.473* 


0.473* 


0.517* 


0.518* 


0.568 


0.540* 


0.555* 


0.512* 


0.511* 


PPE 


0.433 


0.433 


0.501* 


0.501* 


0.559 


0.518* 


0.557* 


0.482* 


0.481* 


ACI 


0.742 


0.742 


0.664 


0.664 


0.645 


0.671 


0.627 


0.694 


0.695 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.327* 


0.327* 


0.359* 


0.359* 


0.395* 


0.380* 


0.387 


0.358* 


0.358* 


PPE 


0.296 


0.296 


0.347 


0.347 


0.389 


0.370* 


0.386 


0.343* 


0.342* 


ACI 


0.517 


0.516 


0.461 


0.461 


0.448 


0.453 


0.423 


0.468 


0.469 



Table 15: Monte Carlo estimates of the mean width of confidence intervals for 
PIqi (intercept term) at the 95% nominal level. Ccnerative models two stages and three 
actions at the second stage. Estimates are constructed using 1000 datasets of size 150, 300 
are drawn from each model, and 1000 bootstraps drawn from each dataset. Estimates sig- 
nificantly below 0.95 at the 0.05 level are marked with *. Models are designated NR = 
non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


XT' A 

Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 




u.you 


u.yoi 


n Q/iQ 
u.y4y 


n Q/iQ 


u.yzo 


n Q/i /I 


u.y4z 


u.yoo 


u.yoo 


PPE 


0.951 


0.951 


0.951 


0.951 


0.925* 


0.930* 


0.937 


0.954 


0.954 


ACI 


0.987 


0.986 


0.977 


0.976 


0.941 


0.955 


0.947 


0.969 


0.969 


A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.935* 


0.936 


0.946 


0.946 


0.944 


0.946 


0.959 


0.945 


0.946 


PPE 


0.933* 


0.935* 


0.944 


0.945 


0.946 


0.941 


0.957 


0.945 


0.945 


ACI 


0.972 


0.973 


0.958 


0.959 


0.959 


0.955 


0.960 


0.956 


0.956 



Table 16: Monte Carlo estimates of coverage probability of confidence intervals for 
/^i,o,2 (main effect of history) at the 95% nominal level. Generative models two stages and 
three actions at the second stage. Estimates are constructed using 1000 datasets of size 150, 
300 are drawn from each model, and 1000 bootstraps drawn from each dataset. Estimates 
significantly below 0.95 at the 0.05 level are marked with *. Models are designated NR — 
non-regular, NNR = near-non-regular, R = regular. 



N = 150 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.342 


0.342 


0.344 


0.344 


0.433* 


0.379 


0.390 


0.336 


0.336 


PPE 


0.340 


0.340 


0.343 


0.343 


0.428* 


0.372* 


0.386 


0.336 


0.336 


ACI 


0.410 


0.410 


0.382 


0.382 


0.469 


0.398 


0.399 


0.362 


0.362 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.236* 


0.236 


0.237 


0.237 


0.302 


0.263 


0.269 


0.231 


0.231 


PPE 


0.234* 


0.234* 


0.236 


0.236 


0.299 


0.259 


0.268 


0.231 


0.231 


ACI 


0.280 


0.280 


0.261 


0.261 


0.327 


0.270 


0.273 


0.242 


0.242 



Table 17: Monte Carlo estimates of the mean width of confidence intervals for 
,5* Q 2 (main effect of history) at the 95% nominal level. Generative models two stages and 
three actions at the second stage. Estimates are constructed using 1000 datasets of size 150, 
300 are drawn from each model, and 1000 bootstraps drawn from each dataset. Estimates 
significantly below 0.95 at the 0.05 level are marked with *. Models are designated NR = 
non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 6 

NR 


XT' A 

Ex. 4 

NNR 


Ex. 5 

NR 


Ex. D 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 




u.yoo 


u.yoo 


n Qi 
u.yio 


u.yzi 


u.yoi 


n 007* 
u.yu / 


u.y4u 


U.ooO 


u.oyo 


PPE 


n QS1 * 

W. t^tJ -L 


n qs2* 


n 927* 


n QIQ* 


932* 


n 883* 


n 91 8* 


8.58* 


856* 


ACI 


0.999 


0.999 


0.968 


0.970 


0.964 


0.972 


0.964 


0.970 


0.971 


A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.950 


0.948 


0.926* 


0.939 


0.939 


0.925* 


0.943 


0.927* 


0.938 


PPE 


0.956 


0.955 


0.952 


0.953 


0.946 


0.887* 


0.943 


0.937 


0.938 


ACI 


0.999 


0.999 


0.965 


0.971 


0.961 


0.974 


0.966 


0.967 


0.970 



Table 18: Monte Carlo estimates of coverage probability of confidence intervals for 
{y\^\^\ (main effect of treatment) at the 95% nominal level. Generative models two stages 
and three actions at the second stage. Estimates are constructed using 1000 datasets of size 
150, 300 are drawn from each model, and 1000 bootstraps drawn from each dataset. Esti- 
mates significantly below 0.95 at the 0.05 level are marked with *. Models are designated 
NR = non-regular, NNR = near-non-regular, R = regular. 



N = 150 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.446* 


0.446 


0.518* 


0.518* 


0.567* 


0.518* 


0.557 


0.508* 


0.507* 


PPE 


0.415* 


0.415* 


0.500* 


0.500* 


0.557* 


0.486* 


0.548* 


0.467* 


0.465* 


ACI 


0.716 


0.716 


0.663 


0.663 


0.643 


0.643 


0.625 


0.673 


0.673 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.306 


0.306 


0.358* 


0.358 


0.395 


0.372* 


0.387 


0.357* 


0.357 


PPE 


0.284 


0.284 


0.346 


0.346 


0.389 


0.345* 


0.386 


0.337 


0.337 


ACI 


0.497 


0.497 


0.461 


0.461 


0.448 


0.436 


0.423 


0.462 


0.462 



Table 19: Monte Carlo estimates of the mean width of confidence intervals for 
/3* ^ I (main effect of treatment) at the 95% nominal level. Generative models two stages 
and three actions at the second stage. Estimates are constructed using 1000 datasets of size 
150, 300 are drawn from each model, and 1000 bootstraps drawn from each dataset. Esti- 
mates significantly below 0.95 at the 0.05 level are marked with *. Models are designated 
NR = non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


XT' A 

Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 




n Q/1 7 


n Q/i s 
u.y4o 


u.yoo 


u.yoo 


u.y4y 


n Q/i /I 


u.you 


U.y4D 


n Q/if; 
u.y40 


PPE 


0.951 


0.950 


0.954 


0.955 


0.945 


0.936 


0.953 


0.944 


0.944 


ACI 


0.978 


0.978 


0.969 


0.970 


0.962 


0.960 


0.961 


0.962 


0.961 


A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.953 


0.954 


0.950 


0.951 


0.939 


0.950 


0.956 


0.960 


0.960 


PPE 


0.955 


0.954 


0.952 


0.951 


0.932* 


0.940 


0.952 


0.959 


0.960 


ACI 


0.985 


0.986 


0.975 


0.975 


0.955 


0.954 


0.958 


0.968 


0.968 



Table 20: Monte Carlo estimates of coverage probability of confidence intervals for 
/^i,i,2 (interaction between history and treatment) at the 95% nominal level. Generative 
models two stages and three actions at the second stage. Estimates are constructed us- 
ing 1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn 
from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. 
Models are designated NR = non-regular, NNR = near-non-regular, R = regular. 



N = 150 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.342 


0.342 


0.344 


0.344 


0.406 


0.379 


0.390 


0.336 


0.336 


PPE 


0.339 


0.339 


0.343 


0.343 


0.402 


0.372 


0.386 


0.335 


0.335 


ACI 


0.410 


0.410 


0.382 


0.382 


0.444 


0.398 


0.399 


0.361 


0.362 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.235 


0.235 


0.236 


0.236 


0.283 


0.263 


0.269 


0.231 


0.231 


PPE 


0.233 


0.233 


0.235 


0.235 


0.279* 


0.259 


0.268 


0.231 


0.231 


ACI 


0.280 


0.280 


0.261 


0.261 


0.308 


0.269 


0.272 


0.242 


0.242 



Table 21: Monte Carlo estimates of the mean width of confidence intervals for 
/3* ^ 2 (interaction between history and treatment) at the 95% nominal level. Generative 
models two stages and three actions at the second stage. Estimates are constructed us- 
ing 1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn 
from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. 
Models are designated NR = non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


XT' A 

Ex. 4 

NNR 


Ex. 5 

NR 


Ex. D 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 




n QQ/1 * 


n Q/i 9 


u.yoo 


u.yo4 


u.yo4 


u.yzu 


u.yo4 


u.yu ( 


n Qi n* 
u.y iu 


PPE 


n Q41 


n Qsq 

VJ . CJ*JU 


n q'H5* 


935* 


929* 


896* 


n 91 8* 


879* 


881 * 

W .OO -L 


ACI 


0.994 


0.994 


0.967 


0.970 


0.950 


0.966 


0.954 


0.961 


0.964 


A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.951 


0.952 


0.932* 


0.936 


0.946 


0.939 


0.948 


0.930* 


0.939 


PPE 


0.960 


0.963 


0.949 


0.951 


0.941 


0.911* 


0.947 


0.932* 


0.933* 


ACI 


0.998 


0.997 


0.968 


0.972 


0.965 


0.970 


0.966 


0.968 


0.972 



Table 22: Monte Carlo estimates of coverage probability of confidence intervals for 
the contrast /^J',!,! + /5i,i,2 (effect of action for history = 1) at the 95% nominal level. Gener- 
ative models two stages and three actions at the second stage. Estimates are constructed 
using 1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn 
from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. 
Models are designated NR = non-regular, NNR = near-non-regular, R = regular. 



N = 150 


Ex. 1 
NR 


Ex. 2 
NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.561* 


0.561 


0.620* 


0.620* 


0.691* 


0.639* 


0.676* 


0.607* 


0.607* 


PPE 


0.536 


0.536 


0.605* 


0.605* 


0.686* 


0.612* 


0.669* 


0.575* 


0.574* 


ACI 


0.833 


0.833 


0.767 


0.767 


0.734 


0.762 


0.741 


0.773 


0.774 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.386 


0.386 


0.428* 


0.428 


0.481 


0.454 


0.469 


0.426* 


0.425 


PPE 


0.367 


0.368 


0.418 


0.418 


0.478 


0.430* 


0.468 


0.409* 


0.409* 


ACI 


0.577 


0.577 


0.532 


0.532 


0.510 


0.517 


0.503 


0.530 


0.530 



Table 23: Monte Carlo estimates of the mean width of confidence intervals for 
the contrast /^i 1 1 + Z?* i 2 (effect of action for history = 1) at the 95% nominal level. Gener- 
ative models two stages and three actions at the second stage. Estimates arc constructed 
using 1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn 
from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. 
Models are designated NR = non-regular, NNR = near-non-regular, R = regular. 
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N = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 6 

NR 


XT' A 

Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 




n Q/1 7 


n Q/i s 
u.y4o 


u.yo / 


u.yoy 


u.yoi 


u.yzy 


n Q/i/1 

U.y44 


n Qi /I * 
u.y 14 


n Qi /I * 
u.y 14 


PPE 




n 949 


n 940 


9S9 


929* 


91 1 * 


n 928* 


89S* 


896* 


ACI 


0.997 


0.997 


0.977 


0.979 


0.967 


0.977 


0.961 


0.978 


0.979 


A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.944 


0.942 


0.923* 


0.934* 


0.938 


0.936 


0.942 


0.931* 


0.935* 


PPE 


0.948 


0.948 


0.948 


0.952 


0.947 


0.910* 


0.937 


0.937 


0.936 


ACI 


0.996 


0.996 


0.964 


0.970 


0.960 


0.961 


0.960 


0.967 


0.969 



Table 24: Monte Carlo estimates of coverage probability of confidence intervals for 
the contrast ^\^\;y — (effect of action for history — -1) at the 95% nominal level. Gen- 
erative models two stages and three actions at the second stage. Estimates are constructed 
using 1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn 
from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. 
Models are designated NR = non-regular, NNR = near-non-regular, R = regular. 



N = 150 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.561 


0.561 


0.624 


0.623 


0.702* 


0.644* 


0.682 


0.610* 


0.610* 


PPE 


0.535 


0.535 


0.607 


0.607 


0.686* 


0.610* 


0.672* 


0.576* 


0.575* 


ACI 


0.834 


0.833 


0.767 


0.767 


0.816 


0.769 


0.751 


0.777 


0.777 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.386 


0.386 


0.430* 


0.430* 


0.490 


0.458 


0.473 


0.425* 


0.425* 


PPE 


0.367 


0.367 


0.418 


0.419 


0.479 


0.432* 


0.471 


0.409 


0.408 


ACI 


0.578 


0.578 


0.531 


0.531 


0.569 


0.522 


0.510 


0.530 


0.530 



Table 25: Monte Carlo estimates of the mean width of confidence intervals for 
the contrast Pin — PI 12 (effect of action for history = -1) at the 95% nominal level. Gen- 
erative models two stages and three actions at the second stage. Estimates are constructed 
using 1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn 
from each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. 
Models are designated NR = non-regular, NNR = near-non-regular, R = regular. 
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4.3 Models with three stages 

Here, we present results using another suite of examples, again similar to those of Chakraborty 
et al. (2009), but that have three stages of treatment, with binary treatments at each stage. 
These models are defined as follows: 

• X, e {-1, 1}, A, G {-1, 1} for ^ G {1, 2, 3} 

• P(A = 1) = P{A = -1) = 0.5 for i G {1, 2, 3} 

. PiX, = 1) = PiX, = -1) = 1/2, 

P(X,+i = l\Xi, Ai) = expit{6iXi + S2Ai) for i G {1, 2} 

Ys = ^i + 6^1 + 6^1 + ^4XiAi + 

^5^2 + ^6^2^2 + ^7AiA2 + 

^8^3 + Cq^s^s + C10A2A3 + e 
e ~ A^(0,1) 

where expit(a;) = e^/(l + e^). This class is parameterized by twelve values ^1, ^2, Cio, ^1, ^2- 
The analysis model uses histories defined by: 



Hsfl — 


(1, Xi, Al, XiAi, X2, A2, X2A2, AiA2,Xsy 


(36) 




(1,^3,^2)^ 


(37) 


H2,0 = 


{l,Xi,A,,X,Ai,X2y 


(38) 


H2,l — 


{i,X2,Aiy 


(39) 


Hi ft = 




(40) 






(41) 
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Example 




S 


Stage 


2&3 Regularity 


1 


(0,0,0,0,0,0,0,0,0,0)T 


(0.5,0.5)T 


P 


= 1,0 


= 0/0 


2 


(0, 0,0, 0,0.01, 0,0, 0.01, 0,0)T 


(0.5,0.5)T 


P 


= o,<^ 


= oo 


3 


(0,0, -0.5, 0,0, 0,0.5, 0.5, 0,0.5)T 


(0.5,0.5)T 


P = 


1/2,0 


= 1.003 


4 


(0, 0, -0.5, 0, 0, 0, 0.49, 0.5, 0, 0.49)^ 


(0.5,0.5)T 


P^ 


= 0,0 = 


= 1.014 


5 


(0, 0, -0.5, 0, 0.5, 0.5, 0.5, 1.0, 0.5, 0.5)t 


(1.0, O.OjT 


P = 


1/4, 


!) = 1.40 


6 


(0, 0, -0.5, 0, 0.12, 0.48, 0.50, 0.25, 0.5, 0.5)t 


(0.1,0.1)T 


P = 


= 0,0 = 


= 0.349 


A 


(0, 0, -0.25, 0, 0.36, 0.49, 0.50, 0.75, 0.5, 0.5)t 


(0.1,0.1)T 


P -- 


= 0,0 


= 1.05 


B 


(0,0, 0,0, 0,0, 0.25, 0.25, 0,0.25)T 


(0,0)T 


P = 


1/2, 


!) = 1.00 


C 


(0,0, 0,0, 0,0, 0.24, 0.25, 0,0.24)T 


(0, 0)T 


P-- 


-0,0 


= 1.03 



Table 26: Parameters indexing the example models. 



The values of the constants ^i, Cio and 6i, 62 in Examples 1 through 7 given in Table 26. 
Since the third stage of these models has the same structure and parameters as the second 
stage of the two-stage models in Chakraborty et al. (2009), the measures p and of non- 
regularity for the final stages in both suites of examples are exactly the same. 

We have chosen parameters ^5, ^e, ^7 so that the measures of non- regularity at stage 2 are 
exactly the same as they are at stage 3. The coefficients and regularity properties for both 
stages are given in Table 26. 

Tables 27 through 38 detail our results for these models. Note that the original work of 
Chakraborty et al. (2009) did not define the Soft Threshold (ST) method for more than two 
stages, so we make the most obvious extension. To produce a ST confidence interval for (31 
for a 3-stage problem, we bootstrap a shrunken estimate of I3\ that is based on the standard 
Q-learning estimate of We then use the hybrid bootstrap to produce the CI. 
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N = 150 


Ex. 1 
NR 


Ex. 2 
NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 
NR 


Ex. C 
NNR 




U.OOVJ 


u.ouo 


n Qi R* 




n Q9fi* 


n Q91 * 




u.yuo 


u.yuo 


PPT? 


n Q9/I * 


u.yo / 


u.yoz 


u.yoi 


n Q9Q* 
u.yzy 


u.oyz 


u.y«jo 


n 887* 
U.oo ( 


n 88/1 * 

U.004i 


arp 

o ± 


u.yoo 


U.y4D 


n Qi 9* 
u.yiz 


u.yu I 


U.ooO 


U.OOo 


U. ( DZ 


n 71 n* 

U. 1 lU 


u.oyz 


APT 


u.you 


u.yoz 


n QUA 


u.yoo 




u.yoo 


u.yoo 


u.yoz 


u.yoo 


A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.833* 


0.879* 


0.933* 


0.937 


0.951 


0.934* 


0.944 


0.934* 


0.934* 


PPE 


0.931* 


0.949 


0.952 


0.950 


0.955 


0.904* 


0.952 


0.926* 


0.925* 


ST 


0.930* 


0.945 


0.944 


0.936 


0.910* 


0.615* 


0.810* 


0.871* 


0.856* 


ACI 


0.952 


0.970 


0.971 


0.971 


0.971 


0.957 


0.959 


0.971 


0.972 



Table 27: Monte Carlo estimates of coverage probability of confidence intervals for 
1^1,0,1 (intercept term) at the 95% nominal level. Generative models have three stages and two 
actions per stage. Estimates arc constructed using 1000 datasets of size 150, 300 are drawn 
from each model, and 1000 bootstraps drawn from each dataset. Estimates significantly 
below 0.95 at the 0.05 level are marked with *. Models are designated NR = non-regular, 
NNR = near-non-regular, R = regular. 



A^ = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.503* 


0.503* 


0.563* 


0.563* 


0.660* 


0.591* 


0.625* 


0.555* 


0.555* 


PPE 


0.481* 


0.481 


0.560* 


0.560* 


0.660* 


0.572* 


0.641* 


0.531* 


0.529* 


ST 


0.438 


0.438 


0.588* 


0.590* 


0.699* 


0.603* 


0.682* 


0.572* 


0.570* 


ACI 


0.657 


0.657 


0.659 


0.659 


0.728 


0.679 


0.694 


0.673 


0.674 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.345* 


0.346* 


0.389* 


0.390 


0.459 


0.417* 


0.428 


0.388* 


0.389* 


PPE 


0.330* 


0.330 


0.387 


0.387 


0.459 


0.401* 


0.446 


0.381* 


0.380* 


ST 


0.298* 


0.298 


0.397 


0.398 


0.472* 


0.430* 


0.447* 


0.415* 


0.416* 


ACI 


0.453 


0.453 


0.456 


0.456 


0.508 


0.459 


0.467 


0.459 


0.460 



Table 28: Monte Carlo estimates of the mean width of confidence intervals for 
PIqi (intercept term) at the 95% nominal level. Generative models have three stages and two 
actions per stage. Estimates are constructed using 1000 datasets of size 150, 300 are drawn 
from each model, and 1000 bootstraps drawn from each dataset. Estimates significantly 
below 0.95 at the 0.05 level are marked with *. Models are designated NR = non-regular, 
NNR = near-non-regular, R — regular. 
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N = 150 
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u. yuy 
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o ± 


u.yoz 


u.yoz 


u.yoo 


u.yoo 


n Q/1 
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A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.963 


0.964 


0.969 


0.968 


0.956 


0.953 


0.957 


0.964 


0.965 


PPE 


0.964 


0.963 


0.969 


0.969 


0.956 


0.949 


0.956 


0.965 


0.966 


ST 


0.967 


0.965 


0.968 


0.968 


0.963 


0.945 


0.959 


0.963 


0.963 


ACI 


0.975 


0.973 


0.974 


0.974 


0.964 


0.955 


0.958 


0.971 


0.971 



Table 29: Monte Carlo estimates of coverage probability of confidence intervals for 
/^i,o,2 (main effect of history) at the 95% nominal level. Generative models have three stages 
and two actions per stage. Estimates are constructed using 1000 datasets of size 150, 300 
are drawn from each model, and 1000 bootstraps drawn from each dataset. Estimates sig- 
nificantly below 0.95 at the 0.05 level are marked with *. Models are designated NR = 
non-regular, NNR = near-non-regular, R = regular. 



A^ = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.339 


0.340 


0.340 


0.340 


0.441 


0.384 


0.388 


0.336 


0.336 


PPE 


0.338 


0.338 


0.340 


0.340 


0.438 


0.379 


0.385 


0.336 


0.336 


ST 


0.337 


0.337 


0.340 


0.340 


0.451 


0.391 


0.397 


0.336 


0.336 


ACI 


0.367 


0.367 


0.355 


0.355 


0.465 


0.396 


0.394 


0.352 


0.352 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.234 


0.234 


0.233 


0.234 


0.307 


0.265 


0.267 


0.231 


0.231 


PPE 


0.233 


0.233 


0.234 


0.234 


0.306 


0.263 


0.266 


0.231 


0.231 


ST 


0.233 


0.233 


0.234 


0.234 


0.312 


0.271 


0.271 


0.231 


0.231 


ACI 


0.249 


0.249 


0.242 


0.242 


0.324 


0.269 


0.269 


0.237 


0.237 



Table 30: Monte Carlo estimates of the mean width of confidence intervals for 
/3* Q 2 (main effect of history) at the 95% nominal level. Generative models have three stages 
and two actions per stage. Estimates are constructed using 1000 datasets of size 150, 300 
are drawn from each model, and 1000 bootstraps drawn from each dataset. Estimates sig- 
nificantly below 0.95 at the 0.05 level are marked with *. Models are designated NR = 
non-regular, NNR = near-non-regular, R — regular. 
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A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.957 


0.953 


0.929* 


0.936 


0.935* 


0.913* 


0.949 


0.928* 


0.924* 


PPE 


0.960 


0.962 


0.934* 


0.940 


0.945 


0.903* 


0.931* 


0.903* 


0.898* 


ST 


0.953 


0.951 


0.920* 


0.923* 


0.947 


0.911* 


0.906* 


0.872* 


0.881* 


ACI 


0.981 


0.979 


0.952 


0.956 


0.958 


0.943 


0.954 


0.946 


0.947 



Table 31: Monte Carlo estimates of coverage probability of confidence intervals for 
Pl^i^i (main effect of treatment) at the 95% nominal level. Generative models have three 

stages and two actions per stage. Estimates arc constructed using 1000 datascts of size 150, 
300 are drawn from each model, and 1000 bootstraps drawn from each datasct. Estimates 
significantly below 0.95 at the 0.05 level are marked with *. Models are designated NR = 
non-regular, NNR = near-non-regular, R = regular. 
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NR 


Ex. 4 

NNR 


Ex. 5 

NR 
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R 


Ex. A 

R 
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CPB 


0.406 


0.406 


0.485* 


0.485* 


0.530 


0.496* 


0.519 


0.474* 


0.472* 


PPE 


0.386 


0.386 


0.466* 


0.465* 


0.523 


0.477* 


0.511* 


0.427* 


0.425* 


ST 


0.400 


0.400 


0.489* 


0.489* 


0.544 


0.517* 


0.545* 


0.484* 


0.481* 


ACI 


0.476 


0.476 


0.530 


0.530 


0.574 


0.553 


0.558 


0.525* 


0.524* 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.279 


0.279 


0.335* 


0.334 


0.371* 


0.352* 


0.362 


0.333* 


0.332* 


PPE 


0.265 


0.265 


0.321* 
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0.366 


0.338* 


0.360* 


0.311* 


0.309* 


ST 


0.274 


0.275 


0.329* 


0.328* 


0.375 


0.368* 


0.373* 


0.344* 


0.343* 


ACI 


0.323 


0.323 


0.364 


0.363 


0.401 


0.381 


0.380 


0.362 


0.362 



Table 32: Monte Carlo estimates of the mean width of confidence intervals for 
/3* 1 1 (main effect of treatment) at the 95% nominal level. Generative models have three 
stages and two actions per stage. Estimates are constructed using 1000 datasets of size 150, 
300 are drawn from each model, and 1000 bootstraps drawn from each dataset. Estimates 
significantly below 0.95 at the 0.05 level are marked with *. Models are designated NR = 
non-regular, NNR = near-non-regular, R — regular. 
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NR 


Ex. 4 
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Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.948 


0.949 


0.952 


0.953 


0.954 


0.947 


0.961 


0.953 


0.954 


PPE 


0.951 


0.951 


0.949 


0.949 


0.954 


0.946 


0.959 


0.952 


0.953 


ST 


0.950 


0.951 


0.948 


0.950 


0.956 


0.945 


0.963 


0.951 


0.951 


ACI 


0.967 


0.966 


0.959 


0.959 


0.964 


0.949 


0.962 


0.960 


0.960 



Table 33: Monte Carlo estimates of coverage probability of confidence intervals for 
/^i,i,2 (interaction between history and treatment) at the 95% nominal level. Generative 

models have three stages and two actions per stage. Estimates are constructed using 1000 
datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn from each 
dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. Models are 
designated NR = non-regular, NNR = near-non-regular, R = regular. 
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Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 
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CPB 


0.339 


0.339 


0.342 
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0.385 


0.389 


0.337 


0.337 


PPE 
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0.338 


0.341 


0.341 


0.421 
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0.386 


0.337 


0.337 


ST 


0.338 


0.338 
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0.392 
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0.337 


0.337 
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0.234 


0.296 


0.264 


0.266 


0.230 


0.230 


PPE 


0.231 


0.231 


0.233 


0.233 


0.293 


0.262 


0.265 


0.230 


0.230 


ST 


0.232 


0.232 


0.234 


0.234 


0.299 


0.270 


0.270 


0.231 


0.231 


ACI 


0.247 


0.247 


0.242 


0.242 


0.311 


0.268 


0.268 


0.237 


0.237 



Table 34: Monte Carlo estimates of the mean width of confidence intervals for 
PI I 2 (interaction between history and treatment) at the 95% nominal level. Generative 
models have three stages and two actions per stage. Estimates are constructed using 1000 
datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn from each 
dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. Models are 
designated NR — non-regular, NNR — near-non-regular, R — regular. 
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N = 150 


Ex. 1 
NR 


Ex. 2 
NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 
NR 


Ex. C 
NNR 
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u.yoi 
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n Q91 * 
u.yzi 


n Qi Q* 
u.y iy 


n 877* 
U.O ( ( 


n 877* 
U.O / / 


arp 

o ± 


n Q/i 7 


u.y4o 


n Qi 


n Q99* 

u.yzz 


n OA 7 

u.y4 ( 


u.yzo 


u.yzo 


n 81 /I * 

U.oi4 


81 Q* 

u.oiy 


APT 


u.y / 


u.y ( D 


n Q/iQ 
u.y^ty 


u.y^iy 


u.yoo 


u.yo 1 


u.y^iD 


n Q^Q 
u.yoy 


u.y 44 


A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.953 


0.956 


0.940 


0.947 


0.946 


0.942 


0.946 


0.933* 


0.934* 


PPE 


0.959 


0.960 


0.946 


0.950 


0.945 


0.933* 


0.944 


0.920* 


0.922* 


ST 


0.954 


0.957 


0.930* 


0.931* 


0.949 


0.939 


0.935* 


0.896* 


0.899* 


ACI 


0.979 


0.980 


0.960 


0.964 


0.955 


0.960 


0.959 


0.954 


0.957 



Table 35: Monte Carlo estimates of coverage probability of confidence intervals for 
the contrast + /?i,i,2 (effect of action for history = 1) at the 95% nominal level. Gener- 
ative models have three stages and two actions per stage. Estimates are constructed using 
1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn from 
each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. Models 
are designated NR = non-regular, NNR = near-non-regular, R = regular. 



A^ = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.529 


0.529 


0.593* 


0.592* 


0.665 


0.624* 


0.644* 


0.581* 


0.579* 


PPE 


0.511 


0.511 


0.575* 


0.575* 


0.664 


0.607* 


0.639* 


0.542* 


0.541* 


ST 


0.522 


0.523 


0.595* 


0.595* 


0.683 


0.645* 


0.669* 


0.589* 


0.586* 


ACI 


0.605 


0.605 


0.641 


0.641 


0.693 


0.682 


0.682 


0.638 


0.637 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.363 


0.363 


0.407 


0.407 


0.465 


0.438 


0.446 


0.405* 


0.404* 


PPE 


0.352 


0.352 


0.396 


0.396 


0.464 


0.426* 


0.445 


0.386* 


0.385* 


ST 


0.360 


0.360 


0.403* 


0.402* 


0.472 


0.454 


0.458* 


0.415* 


0.414* 


ACI 


0.411 


0.411 


0.438 


0.438 


0.483 


0.467 


0.463 


0.436 


0.436 



Table 36: Monte Carlo estimates of the mean width of confidence intervals for 
the contrast /^i 1 1 + Z?* i 2 (effect of action for history = 1) at the 95% nominal level. Gener- 
ative models have three stages and two actions per stage. Estimates are constructed using 
1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn from 
each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. Models 
are designated NR = non-regular, NNR — near-non-regular, R = regular. 
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N = 150 
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U.oi / 


APT 


u.yoo 


u.yoD 




u.yoi 


u.yoi 


u.yoz 


u.yoi 




u.yoy 


A^ = 300 


Ex. 1 
NR 


Ex. 2 

NNR 


Ex. 3 
NR 


Ex. 4 
NNR 


Ex. 5 
NR 


Ex. 6 
R 


Ex. A 
R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.943 


0.945 


0.932* 


0.936 


0.935* 


0.929* 


0.941 


0.931* 


0.934* 


PPE 


0.946 


0.945 


0.937 


0.939 


0.943 


0.908* 


0.926* 


0.922* 


0.920* 


ST 


0.945 


0.947 


0.930* 


0.930* 


0.955 


0.919* 


0.921* 


0.895* 


0.896* 


ACI 


0.973 


0.970 


0.949 


0.953 


0.956 


0.947 


0.952 


0.949 


0.950 



Table 37: Monte Carlo estimates of coverage probability of confidence intervals for 
the contrast /J^^^i — /?i,i,2 (effect of action for history = -1) at the 95% nominal level. Gen- 
erative models have three stages and two actions per stage. Estimates are constructed using 
1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn from 
each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. Models 
are designated NR = non-regular, NNR = near-non-regular, R = regular. 



A^ = 150 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.530 


0.530 


0.595* 


0.594* 


0.693 


0.631* 


0.652 


0.582* 


0.580* 


PPE 


0.513 


0.513 


0.577* 


0.577* 


0.679 


0.609* 


0.641* 


0.544* 


0.543* 


ST 


0.524* 


0.524* 


0.600* 


0.599* 


0.708 


0.653* 


0.678* 


0.590* 


0.587* 


ACI 


0.605 


0.605 


0.643 


0.642 


0.755 


0.687 


0.690 


0.639 


0.638 


N = 300 


Ex. 1 

NR 


Ex. 2 

NNR 


Ex. 3 

NR 


Ex. 4 

NNR 


Ex. 5 

NR 


Ex. 6 

R 


Ex. A 

R 


Ex. B 

NR 


Ex. C 

NNR 


CPB 


0.363 


0.363 


0.409* 


0.408 


0.483* 


0.443* 


0.451 


0.405* 


0.404* 


PPE 


0.351 


0.351 


0.398 


0.398 


0.474 


0.427* 


0.448* 


0.387* 


0.385* 


ST 


0.359 


0.359 


0.405* 


0.404* 


0.486 


0.459* 


0.463* 


0.414* 


0.414* 


ACI 


0.410 


0.410 


0.440 


0.440 


0.527 


0.470 


0.469 


0.436 


0.436 



Table 38: Monte Carlo estimates of the mean width of confidence intervals for 
the contrast /^iii — /3ii2 (effect of action for history = -1) at the 95% nominal level. Gen- 
erative models have three stages and two actions per stage. Estimates are constructed using 
1000 datasets of size 150, 300 are drawn from each model, and 1000 bootstraps drawn from 
each dataset. Estimates significantly below 0.95 at the 0.05 level are marked with *. Models 
are designated NR = non-regular, NNR — near-non-regular, R = regular. 
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